{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Kaggle - 泰坦尼克号\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "Kaggle 链接： [Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic)\n",
    "\n",
    "\n",
    "\n",
    "这虽然是一个入门级别的任务，但完整做完后可以学到以下技巧：\n",
    "\n",
    "1. 如何处理缺失数据\n",
    "2. 数据预处理流程\n",
    "3. 如何处理连续值的属性\n",
    "4. 如何处理离散值的属性\n",
    "\n",
    "另外，还可以实践一下机器学习任务一般流程：\n",
    "\n",
    "1. 搞清楚问题的定义\n",
    "2. 获取数据\n",
    "3. 分析、清洗数据\n",
    "4. 建模尝试解决这些问题\n",
    "5. 优化模型，并完成技术报告"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 数据分析\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "# 可视化\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "%matplotlib inline\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 获取数据\n",
    "\n",
    "1992 年 Titanic 号在大西洋中不幸撞上冰山，数小时候沉入海底。Titanic 号上乘客和工作人员一共  2224 人，其中 1502 人丧生。这个数据集中包含了部分乘客的相关信息，目的是训练一个二分类器，来根据乘客的相关信息判断该乘客是否存活。这里使用的数据可以从 Kaggle 下载到。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "train = pd.read_csv('../data/titanic/train.csv')\n",
    "test = pd.read_csv('../data/titanic/test.csv')\n",
    "train_clone = train.copy(deep=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 观察数据\n",
    "\n",
    "获取到数据后，首先需要观察数据，对数据建立一个初步的认识。明确样本中有哪些特征，各个特征的含义，数据有没有缺失，等等。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "每一行表示一个样本，在这里就是一个乘客的信息。每一列是样本的一项特征，在这里每个样本有 12 个特征，其含义如下：\n",
    "    \n",
    "- PassengerId: 乘客的编号\n",
    "- Survived: 是否存活\n",
    "- Pclass: 船舱等级，取值范围 1,2,3\n",
    "- Name: 姓名\n",
    "- Sex: 性别\n",
    "- Age: 年龄\n",
    "- SibSp: 兄弟姐妹数量\n",
    "- Parch: 父母和孩子数量\n",
    "- Ticket: 船票编号\n",
    "- Fare: 票价\n",
    "- Cabin: 船舱号\n",
    "- Embarked: 登船的港口，泰坦尼克号在不同的港口停靠，不同的港口都有乘客登船 (C=Cherbourg, Q=Queenstown, S=Southampton)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 891 entries, 0 to 890\n",
      "Data columns (total 12 columns):\n",
      "PassengerId    891 non-null int64\n",
      "Survived       891 non-null int64\n",
      "Pclass         891 non-null int64\n",
      "Name           891 non-null object\n",
      "Sex            891 non-null object\n",
      "Age            714 non-null float64\n",
      "SibSp          891 non-null int64\n",
      "Parch          891 non-null int64\n",
      "Ticket         891 non-null object\n",
      "Fare           891 non-null float64\n",
      "Cabin          204 non-null object\n",
      "Embarked       889 non-null object\n",
      "dtypes: float64(2), int64(5), object(5)\n",
      "memory usage: 83.6+ KB\n",
      "None\n",
      "----------------------------------------\n",
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 418 entries, 0 to 417\n",
      "Data columns (total 11 columns):\n",
      "PassengerId    418 non-null int64\n",
      "Pclass         418 non-null int64\n",
      "Name           418 non-null object\n",
      "Sex            418 non-null object\n",
      "Age            332 non-null float64\n",
      "SibSp          418 non-null int64\n",
      "Parch          418 non-null int64\n",
      "Ticket         418 non-null object\n",
      "Fare           417 non-null float64\n",
      "Cabin          91 non-null object\n",
      "Embarked       418 non-null object\n",
      "dtypes: float64(2), int64(4), object(5)\n",
      "memory usage: 36.0+ KB\n",
      "None\n"
     ]
    }
   ],
   "source": [
    "print(train.info())\n",
    "print(\"-\" * 40)\n",
    "print(test.info())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用 `info()` 方法可以从整体上对数据建立一些认识，可以知道数据有多少行，每个属性的数据类型，是否存在缺失值。从上面结果看出，训练集中有 891 个样本，其中 `Age`, `Cabin`, `Embarked` 三个属性存在缺失值。测试集中 Age Fare Cabin 三个属性存在缺失值。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 分析数据\n",
    "\n",
    "样本中 `PassengerId`, `Ticket` 和 `Cabin` 在此处感觉都没有用的信息，因为虽然船票编号和船舱编号可能决定船舱位置，但已经有相应的字段来说明这些隐含的信息了，所以决定去除这三个字段。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 幸存比例\n",
    "\n",
    "训练数据显示，只有 38.4% 的人幸存。即有 61.6% 的人丧生，因此我们的模型就算预测所有人的丧生，也能得到越 61% 的准确率。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.3838383838383838"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train['Survived'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sex\n",
    "\n",
    "观察性别与存活与否的关系："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x7fd13d110940>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAFANJREFUeJzt3X+wXGd93/H3xzKOB2NIQbc1ox9IBQFRwOD6Wi5NSkwxRE47UhogkexO8NRFwxTZnRLjmkJVKkJpRScUEpGipG4oExDGtKnIqFUSMAwxP6rrYGxko+RWBnQlVK4xP0ySWlz72z92dbJer+6uLB2tfPV+zdzRPmefPfu90tH93POcPc+TqkKSJIBzxl2AJOnMYShIkhqGgiSpYShIkhqGgiSpYShIkhqGgiSpYShIkhqGgiSpce64CzhRixcvrhUrVoy7DEl6UrnzzjsfqKqJYf2edKGwYsUKpqamxl2GJD2pJPnGKP0cPpIkNQwFSVLDUJAkNVoNhSRrk+xPMp3k5gHPL09ye5IvJ7k7yc+1WY8kaX6thUKSRcB24CpgNbAxyeq+bm8Hbq2qS4ANwAfaqkeSNFybZwprgOmqOlBVR4GdwPq+PgU8vfv4GcDhFuuRJA3R5kdSlwAHe9ozwOV9fd4B/EGS64ELgCtbrEeSNESbZwoZsK1/7c+NwO9U1VLg54APJ3lcTUk2JZlKMjU7O9tCqZIkaPdMYQZY1tNeyuOHh64D1gJU1ReSnA8sBr7d26mqdgA7ACYnJ11UWlrgbrrpJo4cOcJFF13Etm3bxl3OWaXNM4W9wKokK5OcR+dC8q6+Pt8EXgmQ5CeA8wFPBaSz3JEjRzh06BBHjhwZdylnndZCoarmgM3AHuA+Op8y2pdka5J13W6/ArwhyVeAjwLXVpVnApI0Jq3OfVRVu4Hdfdu29Dy+F/ipNmuQJI3OO5olSQ1DQZLUMBQkSQ1DQZLUMBQkSQ1DQZLUMBQkSQ1DQZLUMBQkSQ1DQZLUaHWaC0kn5ptbXzzuEs4Icw8+EziXuQe/4d8JsHzLPaftvTxTkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUqPVUEiyNsn+JNNJbh7w/HuT3NX9+tMk32uzHknS/Fq7TyHJImA78CpgBtibZFd3CU4Aquqf9/S/HrikrXokScO1eaawBpiuqgNVdRTYCayfp/9G4KMt1iNJGqLNUFgCHOxpz3S3PU6S5wArgU+3WI8kaYg2QyEDttVx+m4AbquqRwbuKNmUZCrJ1Ozs7CkrUJL0WG2GwgywrKe9FDh8nL4bmGfoqKp2VNVkVU1OTEycwhIlSb3anBBvL7AqyUrgEJ0f/Ff3d0ryAuCvAV9osRZJTyKLz38UmOv+qdOptVCoqrkkm4E9wCLglqral2QrMFVVu7pdNwI7q+p4Q0uSzjI3Xuyn08el1amzq2o3sLtv25a+9jvarEGSNDrvaJYkNQwFSVLDUJAkNQwFSVLDUJAkNQwFSVLDUJAkNQwFSVLDUJAkNQwFSVLDUJAkNQwFSVLDUJAkNQwFSVLDUJAkNQwFSVLDUJAkNVoNhSRrk+xPMp3k5uP0+cUk9ybZl+QjbdYjSZpfa8txJlkEbAdeBcwAe5Psqqp7e/qsAt4K/FRVfTfJX2+rHknScG2eKawBpqvqQFUdBXYC6/v6vAHYXlXfBaiqb7dYjyRpiDZDYQlwsKc9093W6/nA85PckeSLSda2WI8kaYjWho+ADNhWA95/FXAFsBT4XJIXVdX3HrOjZBOwCWD58uWnvlJJEtDumcIMsKynvRQ4PKDP/6iqH1XV/cB+OiHxGFW1o6omq2pyYmKitYIl6WzXZijsBVYlWZnkPGADsKuvz+8BrwBIspjOcNKBFmuSJM2jtVCoqjlgM7AHuA+4tar2JdmaZF232x7gO0nuBW4H3lJV32mrJknS/Nq8pkBV7QZ2923b0vO4gDd3vyRJY+YdzZKkhqEgSWoYCpKkhqEgSWoYCpKkhqEgSWoYCpKkhqEgSWoYCpKkhqEgSWoYCpKkhqEgSWoYCpKkhqEgSWoYCpKkhqEgSWoYCpKkhqEgSWq0GgpJ1ibZn2Q6yc0Dnr82yWySu7pf/6TNeiRJ82ttjeYki4DtwKuAGWBvkl1VdW9f149V1ea26pAkja7NM4U1wHRVHaiqo8BOYH2L7ydJOklthsIS4GBPe6a7rd9rktyd5LYkywbtKMmmJFNJpmZnZ9uoVZJEu6GQAduqr/1JYEVVXQz8EfChQTuqqh1VNVlVkxMTE6e4TEnSMW2GwgzQ+5v/UuBwb4eq+k5VPdxt/hZwaYv1SJKGmPdCc5KHePxv942qevo8L98LrEqyEjgEbACu7tv/s6vqW93mOuC+UYqWJLVj3lCoqgsBkmwFjgAfpjMsdA1w4ZDXziXZDOwBFgG3VNW+7r6mqmoXcEOSdcAc8CBw7cl9O5KkkzHqR1J/tqou72n/ZpIvAdvme1FV7QZ2923b0vP4rcBbR6xBktSyUa8pPJLkmiSLkpyT5BrgkTYLkySdfqOGwtXALwL/t/v1OvquD0iSnvxGGj6qqq/jjWeStOCNdKaQ5PlJPpXkq932xUne3m5pkqTTbdTho9+ic0H4RwBVdTedj5hKkhaQUUPhqVX1v/u2zZ3qYiRJ4zVqKDyQ5Ll0b2RL8lrgW/O/RJL0ZDPqfQpvAnYAL0xyCLifzg1skqQFZNRQ+EZVXZnkAuCcqnqozaIkSeMx6vDR/Ul2AH8b+GGL9UiSxmjUUHgBnamt30QnIH4jyU+3V5YkaRxGCoWq+suqurWqfgG4BHg68NlWK5MknXYjr6eQ5GeSfAD4E+B8OtNeSJIWkJEuNCe5H7gLuBV4S1X9eatVSZLGYtRPH72kqn7QaiWSpLEbtvLaTVW1DXhXksetwFZVN7RWmSTptBt2pnBsecyptguRJI3fsOU4P9l9eHdVfflEd55kLfA+Ostx/nZV/bvj9Hst8HHgsqoygCRpTEb99NGvJflakncm+clRXpBkEbAduApYDWxMsnpAvwuBG4AvjViLJKklo96n8ArgCmAW2JHknhHWU1gDTFfVgao6Cuxk8EI976Sz1vP/G7lqSVIrRr5PoaqOVNX7gTfS+XjqliEvWQIc7GnPdLc1klwCLKuq359vR0k2JZlKMjU7OztqyZKkEzTqyms/keQd3ZXXfgP4PLB02MsGbGs+wZTkHOC9wK8Me/+q2lFVk1U1OTExMUrJkqQnYNT7FP4L8FHg1VV1eMTXzADLetpLgd7XXgi8CPhMEoCLgF1J1nmxWZLGY2godC8Y/5+qet8J7nsvsCrJSuAQneU7rz72ZFV9H1jc8z6fAW40ECRpfIYOH1XVI8Czkpx3IjuuqjlgM7CHzv0Ot1bVviRbk6x7QtVKklo18iI7wB1JdgHNvEdV9WvzvaiqdgO7+7YNvEBdVVeMWIskqSWjhsLh7tc5dK4FSJIWoJFCoar+TduFSJLGb9Sps2+n5+Okx1TV3zvlFUmSxmbU4aMbex6fD7wGmDv15UiSxmnU4aM7+zbdkcTlOCVpgRl1+OiZPc1zgEk6N5tJkhaQUYeP7uSvrinMAV8HrmujIEnS+Axbee0y4GBVrey2X0/nesLXgXtbr06SdFoNu6P5g8BRgCQvB94NfAj4PrCj3dIkSafbsOGjRVX1YPfxLwE7quoTwCeS3NVuaZKk023YmcKiJMeC45XAp3ueG/V6hCTpSWLYD/aPAp9N8gDwl8DnAJI8j84QkiRpAZk3FKrqXUk+BTwb+IOqOvYJpHOA69suTpJ0eg0dAqqqLw7Y9qftlCNJGqeR12iWJC18hoIkqWEoSJIarYZCkrVJ9ieZTnLzgOffmOSeJHcl+eMkq9usR5I0v9ZCIckiYDtwFbAa2Djgh/5HqurFVfVSYBsw7/KekqR2tXmmsAaYrqoDVXUU2Ams7+1QVT/oaV7AgIV8JEmnT5t3JS8BDva0Z4DL+zsleRPwZuA8YOBKbkk2AZsAli9ffsoLlSR1tHmmkAHbBi3pub2qngv8C+Dtg3ZUVTuqarKqJicmJk5xmZKkY9oMhRlgWU97KXB4nv47gZ9vsR5J0hBthsJeYFWSlUnOAzYAu3o7JFnV0/z7wJ+1WI8kaYjWrilU1VySzcAeYBFwS1XtS7IVmKqqXcDmJFcCPwK+C7y+rXokScO1Ov11Ve0Gdvdt29Lz+J+1+f6SpBPjHc2SpIahIElqGAqSpIahIElqGAqSpIahIElqGAqSpIahIElqGAqSpIahIElqGAqSpIahIElqGAqSpEars6TqzHbTTTdx5MgRLrroIrZt2zbuciSdAQyFs9iRI0c4dOjQuMuQdAZx+EiS1DAUJEmNVkMhydok+5NMJ7l5wPNvTnJvkruTfCrJc9qsR5I0v9ZCIckiYDtwFbAa2JhkdV+3LwOTVXUxcBvg1U5JGqM2zxTWANNVdaCqjgI7gfW9Harq9qr6i27zi8DSFuuRJA3RZigsAQ72tGe6247nOuB/tliPJGmINj+SmgHbamDH5B8Bk8DPHOf5TcAmgOXLl5+q+iRJfdo8U5gBlvW0lwKH+zsluRJ4G7Cuqh4etKOq2lFVk1U1OTEx0UqxkqR2zxT2AquSrAQOARuAq3s7JLkE+CCwtqq+3WItj3HpW/7r6XqrM9qFDzzEIuCbDzzk3wlw53t+edwlSGPX2plCVc0Bm4E9wH3ArVW1L8nWJOu63d4DPA34eJK7kuxqqx5J0nCtTnNRVbuB3X3btvQ8vrLN95cknRjvaJYkNQwFSVLDUJAkNQwFSVLDUJAkNQwFSVLDldfOYo+ed8Fj/pQkQ+Es9uerXj3uEiSdYRw+kiQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1Wg2FJGuT7E8yneTmAc+/PMmfJJlL8to2a5EkDddaKCRZBGwHrgJWAxuTrO7r9k3gWuAjbdUhSRpdm3MfrQGmq+oAQJKdwHrg3mMdqurr3ecebbEOSdKI2hw+WgIc7GnPdLdJks5QbYZCBmyrJ7SjZFOSqSRTs7OzJ1mWJOl42gyFGWBZT3spcPiJ7KiqdlTVZFVNTkxMnJLiJEmP12Yo7AVWJVmZ5DxgA7CrxfeTJJ2k1kKhquaAzcAe4D7g1qral2RrknUASS5LMgO8Dvhgkn1t1SNJGq7Vldeqajewu2/blp7He+kMK0mSzgDe0SxJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJahgKkqSGoSBJarQaCknWJtmfZDrJzQOe/7EkH+s+/6UkK9qsR5I0v9ZCIckiYDtwFbAa2JhkdV+364DvVtXzgPcC/76teiRJw7V5prAGmK6qA1V1FNgJrO/rsx74UPfxbcArk6TFmiRJ82gzFJYAB3vaM91tA/tU1RzwfeBZLdYkSZrHuS3ue9Bv/PUE+pBkE7Cp2/xhkv0nWZv+ymLggXEXcSbIf3j9uEvQY3lsHvOvT8kAynNG6dRmKMwAy3raS4HDx+kzk+Rc4BnAg/07qqodwI6W6jyrJZmqqslx1yH189gcjzaHj/YCq5KsTHIesAHY1ddnF3Ds17PXAp+uqsedKUiSTo/WzhSqai7JZmAPsAi4par2JdkKTFXVLuA/Ax9OMk3nDGFDW/VIkoaLv5if3ZJs6g7PSWcUj83xMBQkSQ2nuZAkNQwFNZJckeT3x12HFoYkNyS5L8nvtrT/dyS5sY19n83a/EiqpLPbPwWuqqr7x12IRueZwgKTZEWSryX57SRfTfK7Sa5MckeSP0uypvv1+SRf7v75ggH7uSDJLUn2dvv1T1EiHVeS/wT8TWBXkrcNOpaSXJvk95J8Msn9STYneXO3zxeTPLPb7w3d134lySeSPHXA+z03yf9KcmeSzyV54en9jhcOQ2Fheh7wPuBi4IXA1cBPAzcC/xL4GvDyqroE2AL82wH7eBud+0YuA14BvCfJBaehdi0AVfVGOjervgK4gOMfSy+ic3yuAd4F/EX3uPwC8MvdPv+tqi6rqpcA99GZSLPfDuD6qrqUznH+gXa+s4XP4aOF6f6qugcgyT7gU1VVSe4BVtC5c/xDSVbRmVbkKQP28WpgXc+Y7fnAcjr/KaUTcbxjCeD2qnoIeCjJ94FPdrffQ+eXGoAXJflV4MeBp9G596mR5GnA3wE+3jOf5o+18Y2cDQyFhenhnseP9rQfpfNv/k46/xn/YXcNi88M2EeA11SV80zpZA08lpJczvBjFeB3gJ+vqq8kuRa4om//5wDfq6qXntqyz04OH52dngEc6j6+9jh99gDXH5vKPMklp6EuLUwneyxdCHwryVOAa/qfrKofAPcneV13/0nykpOs+axlKJydtgHvTnIHnSlIBnknnWGlu5N8tduWnoiTPZb+FfAl4A/pXA8b5BrguiRfAfbx+LVbNCLvaJYkNTxTkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAXpBHTn8dmX5O4kd3VvwJIWDO9olkaU5GXAPwD+VlU9nGQxcN6Yy5JOKc8UpNE9G3igqh4GqKoHqupwkkuTfLY7Q+eeJM9Ocm53Zs8rAJK8O8m7xlm8NApvXpNG1J147Y+BpwJ/BHwM+DzwWWB9Vc0m+SXgZ6vqHyf5SeA24AY6d5FfXlVHx1O9NBqHj6QRVdUPk1wK/F06U0B/DPhVOtM//2F3ap9FwLe6/fcl+TCdmT9fZiDoycBQkE5AVT1CZ1bZz3SnIn8TsK+qXnacl7wY+B7wN05PhdLJ8ZqCNKIkL+iuQXHMS+msLzHRvQhNkqd0h41I8gvAs4CXA+9P8uOnu2bpRHlNQRpRd+jo1+ks9jIHTAObgKXA++lMSX4u8B+B/07nesMrq+pgkhuAS6vq9eOoXRqVoSBJajh8JElqGAqSpIahIElqGAqSpIahIElqGAqSpIahIElqGAqSpMb/B6YjS2XWc/h8AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.barplot(x='Sex',y='Survived',data=train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可见，性别与幸存率有很大关系。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Age\n",
    "\n",
    "年龄是连续值，不适合直接作为特征，需要将将其分段，划分为不同年龄的范围。这里我们将年龄划分为 5 个段，大约 15 岁一个段比较合适。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.42, 80.0)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train['Age'].min(), train['Age'].max()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    326\n",
       "2    202\n",
       "0     83\n",
       "3     81\n",
       "4     22\n",
       "Name: AgeGroup, dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train['AgeGroup'] = pd.cut(train['Age'], bins=[0, 15, 30, 45, 60, np.inf], labels=range(5))\n",
    "train['AgeGroup'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x7fd13cdfe710>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAFOZJREFUeJzt3X+QXWd93/H3x+sIB2NCg9WKWjLSgPjhGgfXiwN1xjbEBtG0VopNIhNi6JBqyCCg5ceO3TIqOMOkFdSkCQpBFLckLQhj2rJQNSrBhiRuAK3BNUhGVLUNWpmt5RiDTY1t2d/+ca+Or9ervXelPXsl7fs1c2fvOec553z3jLSfe55zz3NSVUiSBHDCsAuQJB09DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1Thx2AXN16qmn1sqVK4ddhiQdU26++eZ7qmppv3bHXCisXLmSiYmJYZchSceUJN8bpJ3dR5KkRquhkGRNkt1J9iS5coblH0pyS/f13ST3tVmPJGl2rXUfJRkBNgMXA5PAjiTjVbXrYJuq+mc97d8KnN1WPZKk/to8UzgX2FNVt1fVw8BWYO0s7S8HPtViPZKkPtoMhdOAvT3Tk915T5Lk2cAq4IYW65Ek9dFmKGSGeYd6os864PqqenTGDSXrk0wkmdi/f/+8FShJeqI2Q2ESWNEzvRy46xBt1zFL11FVbamq0aoaXbq079dsJUmHqc1Q2AGsTrIqyRI6f/jHpzdK8nzgbwB/1WItkqQBtPbto6o6kGQDsB0YAa6tqp1JrgYmqupgQFwObK1j6GHRY2NjTE1NsWzZMjZt2jTsciRp3rR6R3NVbQO2TZu3cdr0e9usoQ1TU1Ps27dv2GVI0rzzjmZJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUqPVUEiyJsnuJHuSXHmINr+WZFeSnUk+2WY9kqTZndjWhpOMAJuBi4FJYEeS8ara1dNmNXAVcF5V/TDJ32yrHklSf22eKZwL7Kmq26vqYWArsHZam38CbK6qHwJU1d0t1iNJ6qPNUDgN2NszPdmd1+t5wPOS3JTkq0nWtFiPJKmP1rqPgMwwr2bY/2rgQmA58BdJzqyq+56woWQ9sB7g9NNPn/9KJUlAu2cKk8CKnunlwF0ztPlcVT1SVXcAu+mExBNU1ZaqGq2q0aVLl7ZWsCQtdm2Gwg5gdZJVSZYA64DxaW3+K/BygCSn0ulOur3FmiRJs2it+6iqDiTZAGwHRoBrq2pnkquBiaoa7y57ZZJdwKPAu6vqrw93n+e8+4/no/S+TrnnfkaA799zf+v7vPkDV7S6fUnq1eY1BapqG7Bt2ryNPe8LeEf3JUkaMu9oliQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUsNQkCQ1DAVJUqPVx3FKWnzGxsaYmppi2bJlbNq0adjlaI4MBUnzampqin379g27DB0mu48kSY1WQyHJmiS7k+xJcuUMy9+YZH+SW7qv32qzHknS7FrrPkoyAmwGLgYmgR1Jxqtq17Smn66qDW3VIUkaXJtnCucCe6rq9qp6GNgKrG1xf5KkI9RmKJwG7O2ZnuzOm+7SJLcmuT7JihbrkST10WYoZIZ5NW3688DKqjoL+DPgEzNuKFmfZCLJxP79++e5TEnSQW2GwiTQ+8l/OXBXb4Oq+uuqeqg7+THgnJk2VFVbqmq0qkaXLl3aSrHSkRgbG+OKK65gbGxs2KVIR6TN+xR2AKuTrAL2AeuA1/U2SPKsqvpBd/IS4LYW65Fa43fzdbxoLRSq6kCSDcB2YAS4tqp2JrkamKiqceBtSS4BDgD3Am9sq5759NiSk5/wU5KOF63e0VxV24Bt0+Zt7Hl/FXBVmzW04SerXznsEiSpFd7RLElqGAqSpIahIElqGAqSpIahIElqGAqSpIahIElqGAqSpIahIElqGAqSpIahIElqGAqSpIahIElqGAqSpIahIElqGAqSpIahIElqtPrkNelocN4fnNf6Ppbct4QTOIG99+1dkP3d9NabWt+HFqdZQyHJ/UAdanlVPX3eK5IkDc2soVBVpwAkuRqYAv4ECPAbwCmtVydJWlCDXlN4VVX9YVXdX1U/rqqPAJe2WZgkaeENek3h0SS/AWyl0510OfBoa1VJ0nFgbGyMqakpli1bxqZNm4ZdzkAGPVN4HfBrwP/tvl7bnTerJGuS7E6yJ8mVs7S7LEklGR2wHh0lxsbGuOKKKxgbGxt2KdJRZ2pqin379jE1NTXsUgY20JlCVd0JrJ3LhpOMAJuBi4FJYEeS8araNa3dKcDbgK/NZfs6Ohz8R69jw1fOv6D1fTx44ggkPDg5uSD7u+DPv9L6PhaTgc4UkjwvyZeSfLs7fVaS9/RZ7VxgT1XdXlUP0+l6milYfgfYBPx0DnVLklowaPfRx4CrgEcAqupWYF2fdU4D9vZMT3bnNZKcDayoqi/MtqEk65NMJJnYv3//gCVLkuZq0FB4alV9fdq8A33WyQzzmnsekpwAfAh4Z7+dV9WWqhqtqtGlS5f2LVaSdHgGDYV7kjyH7h/1JJcBP+izziSwomd6OXBXz/QpwJnAl5PcCbwUGPdisyQNz6BfSX0LsAV4QZJ9wB10bmCbzQ5gdZJVwD463U3NN5aq6kfAqQenk3wZeFdVTQxcvSRpXg0aCt+rqouSnAycUFX391uhqg4k2QBsB0aAa6tqZ/fu6ImqGj/8siVJbRg0FO5I8qfAp4EbBt14VW0Dtk2bt/EQbS8cdLuSpHYMek3h+cCf0elGuiPJh5P8UntlSZKGYaBQqKoHq+q6qnoNcDbwdMA7RiTpODPwQ3aSXJDkD4FvACfRGfZCknQcGeiaQpI7gFuA64B3V9VPWq1KkjQUg15o/oWq+nGrlUiShq7fk9fGqmoT8P4kT3oCW1W9rbXKJEkLrt+Zwm3dn95QJkmLQL/HcX6++/bWqvrmAtSjefL9q1+0IPs5cO/PAydy4N7vtb7P0zd+q9XtH4l6avEYj1FPPeQjzaVjwqDXFK5J8izgM8DWqtrZYk3SMeeR8x4ZdgnSvBj0PoWXAxcC+4EtSb41wPMUJEnHmIHvU6iqqar6feDNdL6eOuNwFZKkY9egT157YZL3dp+89mHgf9IZCluSdBwZ9JrCvwc+Bbyyqu7q11iSdGzqGwpJRoD/U1X/dgHqkSQNUd/uo6p6FHhmkiULUI8kaYgGfsgOcFOScaAZ96iqrmmlKknSUAwaCnd1XyfQebayJOk4NFAoVNX72i5EkjR8gw6dfSMw04B4r5j3iiRJQzNo99G7et6fBFwKHJj/ciRJwzRo99HN02bdlMTHcUrScWbQO5p/vud1apI1wLIB1luTZHeSPUmunGH5m7vjKN2S5C+TnHEYv4MkaZ4M2n10M49fUzgA3Am8abYVuje9bQYuBiaBHUnGq2pXT7NPVtUfddtfAlwDrBm4eknSvOr35LWXAHuralV3+g10rifcCeyaZVWAc4E9VXV7d92twNre9aY94vNkZriYLenY8oyqJ/zUsaXfmcJHgYsAkpwP/C7wVuDFwBbgslnWPQ3Y2zM9Cfzi9EZJ3gK8A1gCzPhtpiTrgfUAp59+ep+StZBOPekx4ED3pwSvf9R/C8eyfqEwUlX3dt//OrClqj4LfDbJLX3WzQzzZvpa62Zgc5LXAe8B3jBDmy10QojR0VE/fhxF3nXWfcMuQdI86neheSTJweD4ZeCGnmX9AmUSWNEzvZzOXdGHshX41T7blCS1qF8ofAr4SpLPAQ8CfwGQ5LnAj/qsuwNYnWRVdzC9dcB4b4Mkq3smfwX433OoXZI0z2b9tF9V70/yJeBZwP+oaq4cnUDn2sJs6x5IsgHYDowA11bVziRXAxNVNQ5sSHIR8AjwQ2boOpIkLZy+X0mtqq/OMO+7g2y8qrYB26bN29jz/u2DbEeStDAGfkazJOn4ZyhIkhqGgiSpYShIkhqGgiSpYShIkhqGgiSpYShIkhqGgiSpYShIkhqGgiSpYShIkhqGgiSpYShIkhqGgiSpYShIkhp9H7IjScejD7/z863v4757ftL8XIj9bfg3//CIt+GZgiSpYShIkhqGgiSpYShIkhqthkKSNUl2J9mT5MoZlr8jya4ktyb5UpJnt1mPJGl2rYVCkhFgM/Bq4Azg8iRnTGv2TWC0qs4Crgc2tVWPJKm/Ns8UzgX2VNXtVfUwsBVY29ugqm6sqv/XnfwqsLzFeiRJfbQZCqcBe3umJ7vzDuVNwH+faUGS9Ukmkkzs379/HkuUJPVqMxQyw7yasWHyemAU+MBMy6tqS1WNVtXo0qVL57FESVKvNu9ongRW9EwvB+6a3ijJRcC/AC6oqodarEeS1EebZwo7gNVJViVZAqwDxnsbJDkb+ChwSVXd3WItkqQBtBYKVXUA2ABsB24DrquqnUmuTnJJt9kHgKcBn0lyS5LxQ2xOkrQAWh0Qr6q2AdumzdvY8/6iNvcvSZob72iWJDUMBUlSw1CQJDUMBUlSw1CQJDUMBUlSw1CQJDUMBUlSw1CQJDUMBUlSw1CQJDUMBUlSw1CQJDUMBUlSw1CQJDUMBUlSw1CQJDUMBUlSw1CQJDUMBUlSw1CQJDVaDYUka5LsTrInyZUzLD8/yTeSHEhyWZu1SJL6ay0UkowAm4FXA2cAlyc5Y1qz7wNvBD7ZVh2SpMGd2OK2zwX2VNXtAEm2AmuBXQcbVNWd3WWPtViHJGlAbXYfnQbs7Zme7M6TJB2l2gyFzDCvDmtDyfokE0km9u/ff4RlSZIOpc1QmARW9EwvB+46nA1V1ZaqGq2q0aVLl85LcZKkJ2szFHYAq5OsSrIEWAeMt7g/SdIRai0UquoAsAHYDtwGXFdVO5NcneQSgCQvSTIJvBb4aJKdbdUjSQvt5CVP5+SnPIOTlzx92KUMrM1vH1FV24Bt0+Zt7Hm/g063kiQdd857zmuGXcKceUezJKlhKEiSGoaCJKlhKEiSGoaCJKlhKEiSGoaCJKlhKEiSGoaCJKlhKEiSGoaCJKlhKEiSGoaCJKlhKEiSGoaCJKlhKEiSGoaCJKlhKEiSGoaCJKlhKEiSGoaCJKnRaigkWZNkd5I9Sa6cYflTkny6u/xrSVa2WY8kaXathUKSEWAz8GrgDODyJGdMa/Ym4IdV9VzgQ8C/bqseSVJ/bZ4pnAvsqarbq+phYCuwdlqbtcAnuu+vB345SVqsSZI0izZD4TRgb8/0ZHfejG2q6gDwI+CZLdYkSZrFiS1ue6ZP/HUYbUiyHljfnXwgye4jrG0+nArc0/ZO8sE3tL2L+bAgx4J/edSfRC7McQDyNo9F4+jvXFiwY/HWa2Zd/OxBttFmKEwCK3qmlwN3HaLNZJITgZ8D7p2+oaraAmxpqc7DkmSiqkaHXcfRwGPR4XF4nMficcfasWiz+2gHsDrJqiRLgHXA+LQ248DBj8KXATdU1ZPOFCRJC6O1M4WqOpBkA7AdGAGuraqdSa4GJqpqHPg48CdJ9tA5Q1jXVj2SpP7a7D6iqrYB26bN29jz/qfAa9usoUVHVXfWkHksOjwOj/NYPO6YOhaxt0aSdJDDXEiSGobCHPUbumOxSHJtkruTfHvYtQxbkhVJbkxyW5KdSd4+7JqGJclJSb6e5H91j8X7hl3TsCUZSfLNJF8Ydi2DMBTmYMChOxaL/wCsGXYRR4kDwDur6oXAS4G3LOJ/Fw8Br6iqXwBeDKxJ8tIh1zRsbwduG3YRgzIU5maQoTsWhar6c2a4p2QxqqofVNU3uu/vp/MHYPrd+4tCdTzQnfyZ7mvRXrhMshz4FeDfDbuWQRkKczPI0B1axLoj/Z4NfG24lQxPt7vkFuBu4ItVtWiPBfB7wBjw2LALGZShMDcDDcuhxSnJ04DPAv+0qn487HqGpaoeraoX0xnF4NwkZw67pmFI8g+Au6vq5mHXMheGwtwMMnSHFqEkP0MnEP5TVf3nYddzNKiq+4Avs3ivPZ0HXJLkTjpdza9I8h+HW1J/hsLcDDJ0hxaZ7nDvHwduq6rZhyQ7ziVZmuQZ3fc/C1wEfGe4VQ1HVV1VVcuraiWdvxU3VNXrh1xWX4bCHHSH9z44dMdtwHVVtXO4VQ1Hkk8BfwU8P8lkkjcNu6YhOg/4TTqfBG/pvv7+sIsakmcBNya5lc6HqC9W1THxVUx1eEezJKnhmYIkqWEoSJIahoIkqWEoSJIahoIkqWEoaNFJ8o+SVJIXHOF23pHkO0m+1R0V9JruTWzSMctQ0GJ0OfCXHMHjX5O8GXgl8NKqehHwEjpj/fzsDG1HDnc/0kLzPgUtKt3xiXYDLwfGq+oFSU4APgxcANxB58PStVV1fZJzgGuApwH3AG+sqh8k2QucX1V3HGI/D3TXexXwTuApwAfpPAJ3B/DbVfVQdwiE0aq6J8ko8MGqujDJe4Hn0BlwcQWwqao+1sIhkZ7AMwUtNr8K/GlVfRe4N8nfBV4DrAReBPwW8DJoxjP6A+CyqjoHuBZ4f5JTgKcdKhC6Tga+XVW/CEzQef7Er3fPKk4EfnuAWs+iM+zyy4CNSf72HH9Xac4MBS02l9MZnIzuz8uBXwI+U1WPVdUUcGN3+fOBM4EvdoeCfg+dQRBDz+i4SV7VHdriziR/rzv7UToD5B3czh3dIAL4BHD+ALV+rqoerKp7ujWdO/dfV5qbE4ddgLRQkjwTeAVwZpICRuj8cf8vh1oF2FlVL5thWz9Jsqqq7qiq7cD27uMWl3Sb/LSqHu3ZzqEc4PEPZydNWza9b9e+XrXOMwUtJpcBf1xVz66qlVW1gs41hHuAS5OckORvARd22+8GliZpupOS/J3ust8FPtIzImh48h/1g74DrEzy3O70bwJf6b6/Ezin+/7Saeut7T7z+JndmnYcxu8szYlnClpMLgf+1bR5nwVeSOdZGd8GvkvnqWk/qqqHk1wG/H6Sn6Pz/+X3gJ3AR4CnAl9L8hDwAHAT8M3pO62qnyb5x8Bnkhy80PxH3cXvAz6e5J/z5Ke1fR34b8DpwO9Ulc/uUOv89pFE51tJVfVA91P514HzutcXhlXPe4EHquqDw6pBi5NnClLHF7pdQUvofCofWiBIw+SZgiSp4YVmSVLDUJAkNQwFSVLDUJAkNQwFSVLDUJAkNf4/yovW3djKsJkAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.barplot(x='AgeGroup',y='Survived',data=train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "不同年龄段的幸存率存在差异，老年人幸存率很低，小孩幸存率则相对高一些。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pclass\n",
    "\n",
    "船舱等级（Pclass）可取值为：\n",
    "\n",
    "- 1 (Upper)\n",
    "- 2 (Middle) \n",
    "- 3 (Lower)\n",
    "\n",
    "\n",
    "船舱等级对幸存与否有很大影响，船舱的等级可能和距离甲板的距离有关，而距离甲板越近的船舱越有机会逃生。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x7fd13cd8aa90>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAEvFJREFUeJzt3X2QXXd93/H3R+soBOM0BW9HHkvCCghaQVw82YjOeIYQYlK5mUiZ8lAZp4lnCBpmENAyoJg+qKCUaSsyMAlVMiiFhjABxbEzrcKoVVMwD3GxIxmEQRJKVPGgldggYQx26kaW/e0fe/XrZb3aeyXv0d213q+ZO3vPub977mfnzuxnz7n3d06qCkmSAJaMOoAkaeGwFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbli1AEu1NVXX13XXXfdqGNI0qJy//33n66q8UHjFl0pXHfddezfv3/UMSRpUUnyjWHGefhIktRYCpKkxlKQJDWdlkKSdUmOJDma5PZZHn9/kgO9218keajLPJKkuXX2QXOSMWAH8EpgEtiXZHdVHTo3pqr+ed/4NwM3dJVHkjRYl3sKa4GjVXWsqs4Au4ANc4y/Bfh4h3kkSQN0WQrXAsf7lid7654kyXOBVcCnOswjSRqgy1LILOvOd+3PjcCdVfX4rBtKNiXZn2T/qVOn5i2gJOkHdTl5bRJY0be8HDh5nrEbgTedb0NVtRPYCTAxMfG0vaj0li1bmJqaYtmyZWzfvn3UcSRdhroshX3A6iSrgBNM/+F/3cxBSV4I/G3g8x1mWRSmpqY4ceLEqGNIuox1dvioqs4Cm4G9wGHgjqo6mGRbkvV9Q28BdlXV03YPQJIWi07PfVRVe4A9M9ZtnbH8ri4zSJKG54xmSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWo6vcjOqP3kO35/1BEuyFWnH2YM+ObphxdV9vvf+8ujjiBpnrinIElqLAVJUmMpSJIaS0GS1HRaCknWJTmS5GiS288z5rVJDiU5mORjXeaRJM2ts28fJRkDdgCvBCaBfUl2V9WhvjGrgXcCN1bVd5P8na7ySJIG63JPYS1wtKqOVdUZYBewYcaYNwA7quq7AFX17Q7zSJIG6LIUrgWO9y1P9tb1ewHwgiT3JLk3yboO80iSBuhy8lpmWVezvP5q4OXAcuBzSV5cVQ/9wIaSTcAmgJUrV85/UkkS0O2ewiSwom95OXByljH/taoeq6qvAUeYLokfUFU7q2qiqibGx8c7CyxJl7suS2EfsDrJqiRLgY3A7hlj/gvwMwBJrmb6cNKxDjNJkubQWSlU1VlgM7AXOAzcUVUHk2xLsr43bC/wnSSHgLuBd1TVd7rKJEmaW6cnxKuqPcCeGeu29t0v4G29myRpxJzRLElqLAVJUmMpSJIaS0GS1FgKkqTmaX05zsXmiaVX/sBPSbrULIUF5K9X/9yoI0i6zHn4SJLUWAqSpMZSkCQ1loIkqfGDZmkebNmyhampKZYtW8b27dtHHUe6aJaCNA+mpqY4ceLEqGNIT5mHjyRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqem0FJKsS3IkydEkt8/y+G1JTiU50Lv9apd5JElz62xGc5IxYAfwSmAS2Jdkd1UdmjH0D6tqc1c5JEnD63JPYS1wtKqOVdUZYBewocPXkyQ9RV2WwrXA8b7lyd66mV6V5IEkdyZZ0WEeSdIAXZZCZllXM5b/BLiuqq4H/ifwkVk3lGxKsj/J/lOnTs1zTEnSOV2WwiTQ/5//cuBk/4Cq+k5V/U1v8XeBn5xtQ1W1s6omqmpifHy8k7CSpG5LYR+wOsmqJEuBjcDu/gFJrulbXA8c7jCPJGmAzr59VFVnk2wG9gJjwIer6mCSbcD+qtoNvCXJeuAs8CBwW1d5JEmDdXqRnaraA+yZsW5r3/13Au/sMoMkaXjOaJYkNZaCJKmxFCRJTaefKUhPxTe3/cSoIwzt7IPPBq7g7IPfWFS5V2798qgjaIFxT0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqZnzhHhJHgbqfI9X1Y/OeyJJ0sjMWQpVdRVA7xKaU8BHgQC3Ald1nk6SdEkNe/joH1bVb1fVw1X1/ar6HeBVXQaTJF16w5bC40luTTKWZEmSW4HHuwwmSbr0hi2F1wGvBf6qd3tNb50k6WlkqFKoqq9X1YaqurqqxqvqF6vq64Oel2RdkiNJjia5fY5xr05SSSYuILskaZ4NVQpJXpDkk0m+0lu+Psm/GvCcMWAHcDOwBrglyZpZxl0FvAW470LDS5Lm17CHj34XeCfwGEBVPQBsHPCctcDRqjpWVWeAXcCGWcb9OrAd+L9DZpEkdWTYUnhmVf35jHVnBzznWuB43/Jkb12T5AZgRVV9YsgckqQOzTlPoc/pJM+jN5EtyauBbw14TmZZ1ybCJVkCvB+4bdCLJ9kEbAJYuXLlcImlS+jqZzwBnO39lBavYUvhTcBO4O8mOQF8jekJbHOZBFb0LS8HTvYtXwW8GPh0EoBlwO4k66tqf/+Gqmpn7/WZmJg47wxraVTefv1Do44gzYthS+EbVXVTkiuBJVX18BDP2QesTrIKOMH0ZxDta6xV9T3g6nPLST4NvH1mIUiSLp1hP1P4WpKdwD8AHhnmCVV1FtgM7AUOA3dU1cEk25Ksv6i0kqRODbun8ELgF5g+jPShJJ8AdlXVn831pKraA+yZsW7reca+fMgskqSODDt57dGquqOq/jFwA/CjwGc6TSZJuuSGvp5Ckp9O8tvAF4BnMH3aC0nS08hQh4+SfA04ANwBvKOq/rrTVJKkkRj2M4W/X1Xf7zSJJGnkBl15bUtVbQfek+RJ8wOq6i2dJZMkXXKD9hQO9346d0CSLgODLsf5J727D1TVFy9BHknSCA377aP3Jflqkl9P8qJOE0mSRmbYeQo/A7wcOAXsTPLlQddTkCQtPkPPU6iqqar6LeCNTH89ddaZyZKkxWvYK6/9vSTv6l157T8C/4vps55Kkp5Ghp2n8J+BjwM/V1UnBw2WJC1OA0uhd63l/11Vv3kJ8kiSRmjg4aOqehx4TpKllyCPJGmEhr7IDnBPkt1AO+9RVb2vk1SSpJEYthRO9m5LmL6MpiTpaWioUqiqd3cdRJI0esOeOvtuYLYT4r1i3hNJkkZm2MNHb++7/wzgVcDZ+Y8jSRqlYQ8f3T9j1T1JvBynJD3NDHv46Nl9i0uACWBZJ4kkSSMz7OGj+/n/nymcBb4OvH7Qk5KsA34TGAP+U1X9+xmPvxF4E/A48AiwqaoODZlJkjTP5py8luSnkiyrqlVV9ePAu4Gv9m5z/vHuzYTeAdwMrAFuSbJmxrCPVdVPVNVLgO2A8x4kaYQGzWj+IHAGIMnLgH8HfAT4HrBzwHPXAker6lhVnQF2ARv6B8y47vOVzPINJ0nSpTPo8NFYVT3Yu/9PgJ1VdRdwV5IDA557LXC8b3kSeOnMQUneBLwNWArM+hXXJJuATQArV64c8LKSpIs1aE9hLMm54vhZ4FN9jw0qlMyybra5Djuq6nnArwGzXrinqnZW1URVTYyPjw94WUnSxRr0h/3jwGeSnAYeBT4HkOT5TB9CmssksKJveTnTp8o4n13A7wzYpiSpQ3OWQlW9J8kngWuA/1FV5/7TXwK8ecC29wGrk6wCTgAbgdf1D0iyuqr+srf488BfIkkamYFfSa2qe2dZ9xdDPO9sks3AXqa/kvrhqjqYZBuwv6p2A5uT3AQ8BnwX+JUL/QUkSfNn2HkKF6Wq9gB7Zqzb2nf/rV2+viQNY8uWLUxNTbFs2TK2b98+6jgj1WkpSNJiMDU1xYkTJ0YdY0EYeOU1SdLlw1KQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKnx3EeS5t2NH7hx1BEuyNKHlrKEJRx/6Piiyn7Pm++Z9226pyBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSU2npZBkXZIjSY4muX2Wx9+W5FCSB5J8Mslzu8wjSZpbZ6WQZAzYAdwMrAFuSbJmxrAvAhNVdT1wJ7C9qzySpMG63FNYCxytqmNVdQbYBWzoH1BVd1fV/+kt3gss7zCPJM2qnlk8ceUT1DNr1FFGrstzH10LHO9bngReOsf41wP/bbYHkmwCNgGsXLlyvvJJEgCP3fjYqCMsGF3uKWSWdbPWcJJfAiaA9872eFXtrKqJqpoYHx+fx4iSpH5d7ilMAiv6lpcDJ2cOSnIT8C+Bn66qv+kwjyRpgC73FPYBq5OsSrIU2Ajs7h+Q5Abgg8D6qvp2h1kkSUPorBSq6iywGdgLHAbuqKqDSbYlWd8b9l7gWcAfJTmQZPd5NidJugQ6vchOVe0B9sxYt7Xv/k1dvr4k6cI4o1mS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSp6bQUkqxLciTJ0SS3z/L4y5J8IcnZJK/uMoskabDOSiHJGLADuBlYA9ySZM2MYd8EbgM+1lUOSdLwruhw22uBo1V1DCDJLmADcOjcgKr6eu+xJzrMIUkaUpeHj64FjvctT/bWSZIWqC5LIbOsq4vaULIpyf4k+0+dOvUUY0mSzqfLUpgEVvQtLwdOXsyGqmpnVU1U1cT4+Pi8hJMkPVmXpbAPWJ1kVZKlwEZgd4evJ0l6ijorhao6C2wG9gKHgTuq6mCSbUnWAyT5qSSTwGuADyY52FUeSdJgXX77iKraA+yZsW5r3/19TB9WkiQtAM5oliQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlS02kpJFmX5EiSo0lun+XxH07yh73H70tyXZd5JElz66wUkowBO4CbgTXALUnWzBj2euC7VfV84P3Af+gqjyRpsC73FNYCR6vqWFWdAXYBG2aM2QB8pHf/TuBnk6TDTJKkOXRZCtcCx/uWJ3vrZh1TVWeB7wHP6TCTJGkOV3S47dn+46+LGEOSTcCm3uIjSY48xWwL2dXA6VGHuBD5jV8ZdYSFYtG9d/wbd8z7LLr3L2+5oPfvucMM6rIUJoEVfcvLgZPnGTOZ5ArgbwEPztxQVe0EdnaUc0FJsr+qJkadQxfO925x8/2b1uXho33A6iSrkiwFNgK7Z4zZDZz7N/PVwKeq6kl7CpKkS6OzPYWqOptkM7AXGAM+XFUHk2wD9lfVbuBDwEeTHGV6D2FjV3kkSYPFf8wXliSbeofLtMj43i1uvn/TLAVJUuNpLiRJjaWwQCT5cJJvJ/nKqLPowiRZkeTuJIeTHEzy1lFn0vCSPCPJnyf5Uu/9e/eoM42Sh48WiCQvAx4Bfr+qXjzqPBpekmuAa6rqC0muAu4HfrGqDo04mobQO4vClVX1SJIfAv4MeGtV3TviaCPhnsICUVWfZZY5Glr4qupbVfWF3v2HgcM8efa+Fqia9khv8Yd6t8v2v2VLQZpHvTP93gDcN9okuhBJxpIcAL4N/GlVXbbvn6UgzZMkzwLuAv5ZVX1/1Hk0vKp6vKpewvSZF9YmuWwP4VoK0jzoHYu+C/iDqvrjUefRxamqh4BPA+tGHGVkLAXpKep9UPkh4HBVvW/UeXRhkown+bHe/R8BbgK+OtpUo2MpLBBJPg58Hnhhkskkrx91Jg3tRuCfAq9IcqB3+0ejDqWhXQPcneQBps/Z9qdV9YkRZxoZv5IqSWrcU5AkNZaCJKmxFCRJjaUgSWosBUlSYylIMyR5vPe10q8k+aMkz5xj7LuSvP1S5pO6ZClIT/ZoVb2kd7baM8AbRx1IulQsBWlunwOeD5Dkl5M80Dvv/kdnDkzyhiT7eo/fdW4PI8lrensdX0ry2d66F/XO4X+gt83Vl/S3ks7DyWvSDEkeqapnJbmC6fMZ/Xfgs8AfAzdW1ekkz66qB5O8C3ikqn4jyXOq6ju9bfxb4K+q6gNJvgysq6oTSX6sqh5K8gHg3qr6gyRLgbGqenQkv7DUxz0F6cl+pHca5f3AN5k+r9ErgDur6jRAVc127YsXJ/lcrwRuBV7UW38P8HtJ3gCM9dZ9HvgXSX4NeK6FoIXiilEHkBagR3unUW56J70btFv9e0xfce1LSW4DXg5QVW9M8lLg54EDSV5SVR9Lcl9v3d4kv1pVn5rn30O6YO4pSMP5JPDaJM8BSPLsWcZcBXyrdxrtW8+tTPK8qrqvqrYCp4EVSX4cOFZVvwXsBq7v/DeQhuCegjSEqjqY5D3AZ5I8DnwRuG3GsH/N9BXXvgF8memSAHhv74PkMF0uXwJuB34pyWPAFLCt819CGoIfNEuSGg8fSZIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlS8/8AbX8xvecrbVoAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.barplot(x='Pclass',y='Survived',data=train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可见船舱等级（Pclass）确实对幸存与否影响很大，不同船舱等级对应的幸存率有明显差异。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fare\n",
    "\n",
    "票价也是连续值，需要将其分桶，转为价格区间，可以先观察一下票价分布情况："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x7fd13cca1710>]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAGkZJREFUeJzt3XuQXOV95vHvr7vnftFIo9H9BmjACLxcPAYBdmKbmFvsQFJAoAgoLm20qWLXzpqyF/YWpzau2BsXELYcHNYkll22MXbsoCLYmBWQhAIDkgGhCzIjQGjQZUajYTQXzfR092//6HdGPdKABqmvZ55PMdXnvOftPu85ap458573nGPujoiIRFes1A0QEZHCUtCLiEScgl5EJOIU9CIiEaegFxGJOAW9iEjEKehFRCJOQS8iEnEKehGRiEuUugEAc+fO9RUrVpS6GSIiFWXz5s0H3b3tRPXKIuhXrFjBpk2bSt0MEZGKYma7p1NPXTciIhGnoBcRiTgFvYhIxCnoRUQiTkEvIhJxCnoRkYhT0IuIRJyCXkSkBJ5/o5e7f7mTV/a8W/B1lcUFUyIiM83/fnwnm3f3Ma+5lvOWthR0XdM6ojezt8zsVTN72cw2hbI5ZvaEmb0eXmeHcjOz+8ys08y2mNmFhdwAEZFKlMo4v3VmG3+0ennB1/VBum4+6e7nu3tHmL8T2Oju7cDGMA9wNdAeftYB9+ersSIikeGOFWlVp9JHfy2wPkyvB67LKf+uZ/0KaDGzhaewHhGRyHHAipT00w16B35pZpvNbF0om+/u+wDC67xQvhjYk/PerlAmIiKBO0U7op/uydjL3H2vmc0DnjCz196n7lRt9+MqZX9hrANYtmzZNJshIhINjmNFOqSf1hG9u+8Nr93Az4CLgAPjXTLhtTtU7wKW5rx9CbB3is98wN073L2jre2Et1MWEYmUYh7RnzDozazBzJrGp4ErgK3ABmBNqLYGeCRMbwBuC6NvVgP94108IiKS5V68PvrpdN3MB34W/sRIAD9w91+Y2YvAw2a2FngbuCHUfwy4BugEhoHP5b3VIiKRUJykP2HQu/sbwHlTlPcCl09R7sDteWmdiEhEHXfisoB0CwQRkRJw97IbXikiInlWNidjRUQk/4p5MlZBLyJSAo5jRTqmV9CLiJSAjuhFRCKuHO91IyIieeSurhsRkUhzKNqwGwW9iEiJaHiliEiUFfHSWAW9iEgJZE/Gqo9eRCSyvEIeJSgiIidJwytFRCKurB48IiIi+Vd2jxIUEZH80hG9iEjEuaMLpkREok63QBARiTA9YUpEZAZQH72ISITp4eAiIhGnB4+IiEScHiUoIhJxOqIXEYk43etGRCTivIiPmFLQi4iUhMbRi4hEmu51IyISceqjFxGZAcpueKWZxc3sJTN7NMyfZmbPm9nrZvYjM6sO5TVhvjMsX1GYpouIVC734l0b+0GO6L8A7MiZ/zpwj7u3A33A2lC+Fuhz95XAPaGeiIjkKLuuGzNbAvwu8O0wb8CngJ+EKuuB68L0tWGesPxyK9ZjVEREKkQ5noy9F/gykAnzrcC77p4K813A4jC9GNgDEJb3h/oiIhJkb1NcJn30ZvYZoNvdN+cWT1HVp7Es93PXmdkmM9vU09MzrcaKiERFud298jLg98zsLeAhsl029wItZpYIdZYAe8N0F7AUICyfBRw69kPd/QF373D3jra2tlPaCBGRilNO97px97vcfYm7rwBuAp5091uAp4DrQ7U1wCNhekOYJyx/0ot5ellEpAJkHxlbJl037+O/AF80s06yffAPhvIHgdZQ/kXgzlNroohI9BTzUYKJE1c5yt2fBp4O028AF01RZwS4IQ9tExGJrOwRfXHoylgRkRIpmz56ERHJv2KeuVTQi4iUgFNG4+hFRCT/yvHKWBERyaMiPmBKQS8iUhJeGePoRUTkJLkeJSgiEm3qoxcRibiyux+9iIjkl7urj15EJMp0RC8iMgOoj15EJMJ0CwQRkZlAt0AQEYmm8WcxqetGRCSixrttdDJWRCSixrvnNbxSRCSiJrpudEQvIhJNR4/oi0NBLyJSZOqjFxGJOGe860Z99CIikgcKehGRIivmVbGgoBcRKRn10YuIRNTEyViNoxcRiaajJ2OLsz4FvYhIkR09oi8OBb2ISJFNXDClI3oRkWg6evdK9dGLiERS2R3Rm1mtmb1gZq+Y2TYz+4tQfpqZPW9mr5vZj8ysOpTXhPnOsHxFYTdBRKSylOM4+lHgU+5+HnA+cJWZrQa+Dtzj7u1AH7A21F8L9Ln7SuCeUE9ERMZN3OumTLpuPGswzFaFHwc+BfwklK8HrgvT14Z5wvLLrVhbIyJSQcpq1I2Zxc3sZaAbeALYBbzr7qlQpQtYHKYXA3sAwvJ+oHWKz1xnZpvMbFNPT8+pbYWISAVxitt3M62gd/e0u58PLAEuAs6eqlp4neqX1HFb5e4PuHuHu3e0tbVNt70iIhWvrG9T7O7vAk8Dq4EWM0uERUuAvWG6C1gKEJbPAg7lo7EiIlFQdg8eMbM2M2sJ03XA7wA7gKeA60O1NcAjYXpDmCcsf9K92OeYRUTK19FHCRYn6hMnrsJCYL2Zxcn+YnjY3R81s+3AQ2b2l8BLwIOh/oPA98ysk+yR/E0FaLeISMUq9jj6Ewa9u28BLpii/A2y/fXHlo8AN+SldSIiEaR73YiIRJxT3LOxCnoRkWLTEb2ISLSV3b1uRESkMHT3ShGRiCrHm5qJiEge6VGCIiIRp+GVIiIRlkpneHZXL1BGF0yJiEj+fO47L/Jvrx8EYFZdVVHWqaAXESmizu5B5jbW8MM/uZiV8xqLsk513YiIFNHgSIrPnreQ9vlN5fOEKRERyY9MxhlMpmiqLU6XzTgFvYhIkQyMpnCH5tri9por6EVEimRf/xEAmnVELyISTT976R0AFrXUFXW9CnoRkSIZHcsA8LH2uUVdr4JeRKRIjiTTzG+uKfp6FfQiIkUyPJamobr4ly8p6EVEimR4NEVddbzo61XQi4gUyXAyTb2CXkQkuoaTKerUdSMiEk0/fOFtXunqp0FH9CIi0ePu/OWj22msSXDpyuIOrQQFvYhIwR04PMpQMs2XrzqLW1cvL/r6FfQiIgW2u3cIgOWtDSVZv4JeRKTAeoeSAMxrKv7FUqCgFxEpuN7BUQBaG6tLsn4FvYhIAXUPjPBMZ/bRgXPqFfQiIpHzt0/t4vFtBzhzfiOJeGki94RrNbOlZvaUme0ws21m9oVQPsfMnjCz18Pr7FBuZnafmXWa2RYzu7DQGyEiUq6GRlPMb67h0f/08ZK1YTq/XlLAHe5+NrAauN3MVgF3AhvdvR3YGOYBrgbaw8864P68t1pEpEKkM051IkZ1onQdKCdcs7vvc/dfh+kBYAewGLgWWB+qrQeuC9PXAt/1rF8BLWa2MO8tFxGpAGMZpypW2l7yD7R2M1sBXAA8D8x3932Q/WUAzAvVFgN7ct7WFcpERGacVDpDIm4lbcO0g97MGoF/BP7M3Q+/X9UpynyKz1tnZpvMbFNPT890myEiUlHG0k6iEo7ozayKbMh/391/GooPjHfJhNfuUN4FLM15+xJg77Gf6e4PuHuHu3e0tbWdbPtFRMpaKlMBR/RmZsCDwA53vztn0QZgTZheAzySU35bGH2zGugf7+IREZlpUmknEStt0E/nxsiXAbcCr5rZy6HsvwJfAx42s7XA28ANYdljwDVAJzAMfC6vLRYRqSBj6UzJxs+PO2HQu/szTN3vDnD5FPUduP0U2yUiEgmpjFNbVQF99CIicnJS6UxlnIwVEZGTM5Z2qsr9ZKyIiJy8VEZH9CIikZbKePkPrxQRkZNXDsMrFfQiIgWUKoPhlQp6EZECGsvoZKyISGSNptL0DIzqZKyISFQ9t6sXQBdMiYhE1cHBJAC3rl5R0nYo6EVECqRvKBv0sxuqStoOBb2ISIHsPzxCPGY01kzn/pGFU9q1i4hEUM/AKN959k0efOZN5jXVkL3be+ko6EVE8ux/PrKVn2/dT3U8xjduOK/UzVHQi4jk08hYmp9v3c8nzmrj7hvPZ05DdambpD56EZF8enzbfgAuXDa7LEIeFPQiInnVfXgUgD++bEVpG5JDQS8ikke9Q0mq4kZTiUfa5FLQi4jk0Zaud5ldX13ykTa5FPQiInmwc/8AN37rOZ7d1cuCWbWlbs4k5fO3hYhIBfvKhm288NYhbr5oGet+6/RSN2cSBb2ISB70Do3y8fa5/NUffLjUTTmOum5ERPLg8JEUC8usy2acgl5EJA/6j4zRXFvam5e9FwW9iMgp6h4Y4chYmll1CnoRkcjpHRzloq9uBKC1sabErZmagl5E5BR0dg8C8PnL27nugkUlbs3UFPQiIicpnXG++tgOAH7/gsXUV5fnQMbybJWISJnbtref+za+zpauflobqlkyu67UTXpPCnoRkQ+oe2CEz/6fZ8g4XLRiDt//k4upipdvB8kJW2Zmf29m3Wa2Nadsjpk9YWavh9fZodzM7D4z6zSzLWZ2YSEbLyJSCn/6vc1kHO6+8Twe/tNLyjrkYXp99N8Brjqm7E5go7u3AxvDPMDVQHv4WQfcn59mioiUj87uQeY11fAHFy4pdVOm5YRB7+7/Chw6pvhaYH2YXg9cl1P+Xc/6FdBiZgvz1VgRkVLLZJzB0RR/+NGlpW7KtJ3s3xvz3X0fQHidF8oXA3ty6nWFsuOY2Toz22Rmm3p6ek6yGSIixTUwmiLjlO3FUVPJd8fSVDdg9qkquvsD7t7h7h1tbW15boaISGEcPjIGzIygPzDeJRNeu0N5F5D798wSYO/JN09EpLw89OLbALTUl8fzYKfjZIN+A7AmTK8BHskpvy2MvlkN9I938YiIVCp354ntB/jSj1/hm0/toqk2werT55S6WdN2wnH0ZvZD4BPAXDPrAv4c+BrwsJmtBd4GbgjVHwOuATqBYeBzBWiziEjRuDt/+/Qu/vrxnQCsWtjMN244j6YyvVPlVE4Y9O5+83ssunyKug7cfqqNEhEpF//86r6JkN/ylSvK9lbE70dXxoqITOGRl9/hm0918psD2ZuWPXnHb1dkyIOCXkQEgD2Hhvn1233s7x9h0+4+nth+gKbaBGsuWc7VH17I6W2NpW7iSVPQi8iM1H9kjA0vv8MznQd58a0+Dg0lJ5Y11Sb4ePtcvvVHH6GhpvJjsvK3QETkAxgZS9PZPcjnH3qJN3qGqK+O84mz2jhn0SwuWzmXRS21tDXWYDbVZUGVSUEvIjPGcDLFrQ++wObdfQB89ffP5caOpWV/U7JTpaAXkUh7emc3v9x+gJGxNP+ys4feoSQ3fXQpn/l3i7hsZWukjtzfi4JeRCKrZ2CUP/6HF6mrijOnoZrWxmruuOIsbr5o6YwI+HEKehGJpJ6BUa68918BuO/mC/j0qvklblHpRLtjSkRmpL6hJHf8+BUODSX5u1s/MqNDHnRELyIRk844/+F7m3nhrUP89989myvPWVDqJpWcgl5EKlYyleGpnd08vbOb0bEMh4aTbOnq59BQki9deRb//uOnl7qJZUFBLyIVZSydobN7kNf2H+Z//NM2BkdT1CRizGuuoSoe4yPLZ/PbZ7Zxy8XLSt3UsqGgF5Gy1js4yl0/fZWdBwZIpZ2+4STDyfTE8i9deRZrP3YatVXxErayvCnoRaRs3f6DX/P41v2YwRXnLKA2EaemKsbZC5o4d/EsWhtqWNZaX+pmlj0FvYiUla6+Ya6//zkODSdJpjKcNb+Je286n7MXNpe6aRVLQS8iZeXZzl72Hx7htkuWM7+5ljWXrqAxAjcWKyXtPREpqWQqQ/fACOmM84ut+/nrx3fSXJvgK589h1hs5ly9WkgKehEpulQ6w/rndvOdZ9+kd3DyydXlrfV89boPK+TzSEEvIgXn7uztH+GdviM8+Vo3P3h+N4dHUqxa2Mwnz5rHqoXNVMVjNNTE+eSH5lGT0AiafFLQi0jB7Nh3mEde3su//KaHHfsOT5Qvm1PPFz99Jjd+dCn11YqhQtMeFpFpyWScwyNj7OsfYXfvMD0DI6QyTlffEQ4NJRkaTbGvf4ThZIqRsQzDyRR9w2PEDM5oa+TPP7uKZXPq6Vg+h1n1lfns1UqloBeZgTIZZyyTYWg0Pan88JExHt+2n96hJOmMk844qUz2StRX9vRzZCx93GdVxY0Fs2qpq4qzYFYdy1rrqauKU1cVZ3ZDNbdcvIz5zbXF2jSZgoJepEL0Hxnj4OAo7vDjTXvY1TNEV98wo6nMRChnfPJrKuNkMk7anUwG0qH8RKoTMarjMeIxIxEzGmoS/OFHl7J0Tj21VTFWLWxmcUsdVfEYddVxXZVa5hT0ImXo4OAoG17ey+bdfXQPjNA9MMru3uFJdRbOqmXlvEZa6quJG8RiRtyMeMwmT5uRiGdf4zGIW3Z5VTxGQ3V80gM4YjHjwmUtnLNoVrE3WQpIQS9SJOmM039kjLcPDXPg8Aid3YPsfTfbv909MMpbB4cYGUuTdieZypBxaG2opn1+I+cumsWNHUtZMrsOgLmNNVx6xsx4DJ6cOgW9SAG4O2NpZ2BkjJ9v3c//23GA7XsP0z0wOqlea0M1LfVVzGuq5WPtc2lrrCEWM6rjMa67YBFntDUqzOWUKehFOHq0fWgoSd9wMvs6lCSZzuCeDe6MwxsHB+kbGuPIWJojyTTDY2lGkmmGx1IcSWYYGUszMpYmdUw/+OlzG1i1qJk1K+awqKWW9nlNtDXV6CSlFIWCXirK4ZExegZGeXc4yVsHh3lpTx/DyWzoDoyk2Nt/BE58rvE4+/pHphxRcqyaRIwls+uoq86OKplVV8WC5hrqqxPUhpEmtVUxEvEY1XEjEY9x7qJZXHpGq670lJJR0EteZMIR8XA4oj36kz3KPZIzPdWy4WSaLV3vMjKWmfhMd5/IbA8T3QMj5B4sN9UkaK6ror46Tn1Ngg8taCIe++CPQl59RitnzmtkdkM1cxqqmV1fzeyGamoSMQwwMwyor4nrqk2pOAUJejO7CvgbIA58292/Voj1vJeh0RTPv9nLwEgqGyTJNCOpzKRwGUtncMYDJDv0zHHcIePZ6fAfGc+WO0cvGpko87A8+zETn5H9bA+flV3R+Pom6oT3cmw5OSGX8/nHvpecehPzOZ9BzvYdXX58gOau6z3rHPMZ49s6vjztR5dPV8ygtio7NK82EePMBU20NtQAMN4tbZOmjfqaOOctaWFWfRWtDdWcs2gWcR0pi7yvvAe9mcWBbwKfBrqAF81sg7tvz/e6pnIkmeaGbz3H9pzLrXNVx2PUVGXHCGcDxIhZNkxsYtrCtkDMLCw7Ot1UW0U8ZhNlhOUWAyP7ueOfZ3b0aHD8c47OW05Zbv3sNOPrn+K95NbPef94u8e3jUnvze6D3PfB5O2193pPznqn+oxEzJjdUE19GFOdG+C1VfHsWOtEtlujJnRxVMVNJxpFiqAQR/QXAZ3u/gaAmT0EXAsUJej/+dV9bN93mP917TlcunLupH7TmkRcR38iMuMUIugXA3ty5ruAiwuwHh5+cQ//99/emFTWO5SkuTbBLRcv18kvEREKE/RTpetxvbdmtg5YB7Bs2ck9rb2lvor2+Y2TytqBS8+Yq5AXEQkKEfRdwNKc+SXA3mMrufsDwAMAHR0dJzEgLvuw4CvOWXAybxURmTE++Di0E3sRaDez08ysGrgJ2FCA9YiIyDTk/Yje3VNm9h+Bx8kOr/x7d9+W7/WIiMj0FGQcvbs/BjxWiM8WEZEPphBdNyIiUkYU9CIiEaegFxGJOAW9iEjEKehFRCLO/IPecrAQjTDrAXaf5NvnAgfz2Jwo0D6ZTPtjMu2P41XqPlnu7m0nqlQWQX8qzGyTu3eUuh3lRPtkMu2PybQ/jhf1faKuGxGRiFPQi4hEXBSC/oFSN6AMaZ9Mpv0xmfbH8SK9Tyq+j15ERN5fFI7oRUTkfVR00JvZVWa208w6zezOUrenGMxsqZk9ZWY7zGybmX0hlM8xsyfM7PXwOjuUm5ndF/bRFjO7sLRbUBhmFjezl8zs0TB/mpk9H/bHj8ItszGzmjDfGZavKGW7C8XMWszsJ2b2WviuXDKTvyNm9p/D/y9bzeyHZlY7k74jFRv0OQ8hvxpYBdxsZqtK26qiSAF3uPvZwGrg9rDddwIb3b0d2BjmIbt/2sPPOuD+4je5KL4A7MiZ/zpwT9gffcDaUL4W6HP3lcA9oV4U/Q3wC3f/EHAe2X0zI78jZrYY+DzQ4e7nkr19+k3MpO+Iu1fkD3AJ8HjO/F3AXaVuVwn2wyPAp4GdwMJQthDYGab/Drg5p/5Evaj8kH2K2UbgU8CjZB9neRBIHPtdIfuchEvCdCLUs1JvQ573RzPw5rHbNVO/Ixx9jvWc8G/+KHDlTPqOVOwRPVM/hHxxidpSEuFPyguA54H57r4PILzOC9Vmwn66F/gykAnzrcC77p4K87nbPLE/wvL+UD9KTgd6gH8I3VnfNrMGZuh3xN3fAb4BvA3sI/tvvpkZ9B2p5KCf1kPIo8rMGoF/BP7M3Q+/X9UpyiKzn8zsM0C3u2/OLZ6iqk9jWVQkgAuB+939AmCIo900U4n0PgnnIq4FTgMWAQ1ku6uOFdnvSCUH/bQeQh5FZlZFNuS/7+4/DcUHzGxhWL4Q6A7lUd9PlwG/Z2ZvAQ+R7b65F2gxs/EnqOVu88T+CMtnAYeK2eAi6AK63P35MP8TssE/U78jvwO86e497j4G/BS4lBn0HankoJ+RDyE3MwMeBHa4+905izYAa8L0GrJ99+Plt4WRFauB/vE/36PA3e9y9yXuvoLsd+BJd78FeAq4PlQ7dn+M76frQ/2KPlo7lrvvB/aY2Vmh6HJgOzP0O0K2y2a1mdWH/3/G98fM+Y6U+iTBKZ5kuQb4DbAL+G+lbk+RtvljZP+M3AK8HH6uIduHuBF4PbzOCfWN7OikXcCrZEcelHw7CrRvPgE8GqZPB14AOoEfAzWhvDbMd4blp5e63QXaF+cDm8L35J+A2TP5OwL8BfAasBX4HlAzk74jujJWRCTiKrnrRkREpkFBLyIScQp6EZGIU9CLiEScgl5EJOIU9CIiEaegFxGJOAW9iEjE/X8fCUQ0dLWlhAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "Fare = np.sort(train['Fare'].values)\n",
    "plt.plot(Fare)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从上图可以看出，票价主要分布在 100 以内，划分 6 个段：\n",
    "\n",
    "```\n",
    "0  - 20\n",
    "20 - 40\n",
    "40 - 60\n",
    "60 - 80\n",
    "80 -\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "train['FareGroup'] = pd.cut(train['Fare'], bins=[0, 20, 40, 60, 80, np.inf] , labels=range(5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x7fd13ccd20f0>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAEh1JREFUeJzt3X+QXWV9x/H3J8EURdBR4sQSYhiNVmpVxkgdY9X6q2gt1NYfUNHaUjOdEW2ruNXRoUrrtI2t2lZwTNVqbYXij7ZpJx1qFdTBqoQfooB0UqKywRWQHyJFIeTbP+7l4bLZ7N6EPXuS7Ps1c2fPOfe553xzJrOffZ5zz3NSVUiSBLCk7wIkSfsOQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkpqD+i5gTx1++OG1evXqvsuQpP3KxRdffGNVLZ+r3X4XCqtXr2bLli19lyFJ+5Uk3xmnncNHkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLU7Hc3r0nS/mJiYoKpqSlWrFjBhg0b+i5nLIaCJHVkamqK7du3913GHnH4SJLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWdhkKS45JcnWRrkrfM8P6qJOcnuTTJ5Ule1GU9kqTZdRYKSZYCZwIvBI4GTkpy9LRmbwfOrapjgBOBs7qqR5I0ty57CscCW6vqmqq6EzgHOGFamwIOGy4/BLiuw3okSXPoMhSOAK4dWZ8cbhv1DuDkJJPAZuD1M+0oyfokW5JsueGGG7qoVZJEt6GQGbbVtPWTgI9W1UrgRcDHk+xSU1VtrKq1VbV2+fLlHZQqSYJuQ2ESOHJkfSW7Dg+dApwLUFX/DRwMHN5hTZKkWXQZChcBa5IclWQZgwvJm6a1+S7wXIAkj2cQCo4PSVJPOguFqtoBnAqcB1zF4FtGVyQ5I8nxw2ZvAl6b5OvA2cBrqmr6EJMkaYF0+ozmqtrM4ALy6LbTR5avBNZ1WYMkaXze0SxJajrtKUjSvur9b/q3zo9xy423t58LcbxT//JX7vc+7ClIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSY1TZ0vzYGJigqmpKVasWMGGDRv6Lkfaa4aCNA+mpqbYvn1732VI95vDR5KkxlCQJDWGgiSpMRQkSY2hIElqDAVJUmMoSJIa71OQNK+8ke9ehyw77D4/9weGgqR55Y1891r36F/ru4Q95vCRJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJajoNhSTHJbk6ydYkb9lNm5cnuTLJFUk+0WU9kqTZdTYhXpKlwJnA84FJ4KIkm6rqypE2a4C3Auuq6uYkj+iqHknS3LrsKRwLbK2qa6rqTuAc4IRpbV4LnFlVNwNU1fUd1iNJmkOXoXAEcO3I+uRw26jHAo9NcmGSryQ5rsN6JElz6PJ5CplhW81w/DXAs4GVwJeSPKGqbrnPjpL1wHqAVatWzX+lkiSg257CJHDkyPpK4LoZ2vxrVd1VVduAqxmExH1U1caqWltVa5cvX95ZwZK02HUZChcBa5IclWQZcCKwaVqbfwF+ESDJ4QyGk67psCZJ0iw6C4Wq2gGcCpwHXAWcW1VXJDkjyfHDZucBP0hyJXA+8Oaq+kFXNUmSZtfpM5qrajOwedq200eWC3jj8CVJ6pl3NEuSmk57CtK+YN3frOv8GMtuWcYSlnDtLdcuyPEufP2FnR9Di5M9BUlSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVIz6x3NSW5j12cgNFV12LxXJEnqzayhUFWHAiQ5A5gCPs7g4TmvBA7tvDpJ0oIad/jol6rqrKq6rap+WFUfAH69y8IkSQtv3Anx7k7ySuAcBsNJJwF3d1aVpE584ZnP6vwYdxy0FBLumJxckOM964tf6PwYi8m4PYXfAF4OfH/4etlwmyTpADJWT6Gqvg2c0G0pkqS+jdVTSPLYJJ9L8s3h+hOTvL3b0iRJC23c4aO/Bd4K3AVQVZcDJ3ZVlPYfExMTvPrVr2ZiYqLvUiTNg3EvND+oqr6WZHTbjg7q0X5mamqK7du3912GpHkybk/hxiSPZngjW5KXAt/rrCpJUi/G7Sm8DtgI/EyS7cA2BjewSZIOIOOGwneq6nlJDgGWVNVtXRYlSerHuMNH25JsBJ4G/KjDeiRJPRo3FB4H/BeDYaRtSd6f5BndlSVJ6sNYoVBVd1TVuVX1a8AxwGGA95ZL0gFm7OcpJHlWkrOAS4CDGUx7IUk6gIx1oTnJNuAy4FzgzVV1e6dVSZJ6Me63j55UVT/stBJJUu/mevLaRFVtAN6VZJcnsFXVGzqrTJK04ObqKVw1/Lml60IkSf2b63Gc/zZcvLyqLl2AeiRJPRr320fvSfKtJH+c5Gc7rUiS1Jtx71P4ReDZwA3AxiTf8HkK0r3qQcXOQ3ZSD9rl0pu0Xxn7PoWqmqqqvwZ+l8HXU0/vrCppP3PXuru48/l3cte6u/ouRbpfxn3y2uOTvGP45LX3A18GVnZamSRpwY17n8LfAWcDL6iq6zqsR/Pku2f83IIcZ8dNDwMOYsdN3+n8mKtO/0an+5c0RigkWQr8b1X91QLUI0nq0ZzDR1V1N/DwJMsWoB5JUo/GfsgOcGGSTUCb96iq3jPbh5IcB/wVsBT4UFX92W7avRT4JPDUqvJGOUnqybihcN3wtQQ4dJwPDIedzgSeD0wCFyXZVFVXTmt3KPAG4KvjFi1J6sZYoVBV79yLfR8LbK2qawCSnAOcAFw5rd0fAxuA0/biGJKkeTTu1NnnAzNNiPecWT52BHDtyPok8PPT9nsMcGRV/XsSQ0GSejbu8NHoL+yDgV8HdszxmcywrQVLkiXAe4HXzHXwJOuB9QCrVq2aq7kkaS+NO3x08bRNFyaZ63Gck8CRI+srGVyXuMehwBOAC5IArAA2JTl++sXmqtoIbARYu3at8whIUkfGHT562MjqEmAtg1/is7kIWJPkKGA7cCLwG/e8WVW3AoePHOMC4DS/fSRJ/Rl3+Ohi7h362QF8Gzhltg9U1Y4kpwLnMfhK6keq6ookZwBbqmrT3pUsSerKXE9eeypwbVUdNVz/TQbXE77Nrt8i2kVVbQY2T9s240R6VfXssSqWJHVmrjuaPwjcCZDkmcCfAh8DbmU4xi9JOnDMNXy0tKpuGi6/AthYVZ8GPp3ksm5LkyQttLl6CkuT3BMczwU+P/LeuNcjJC0iD63iYVU8tPyi4P5orl/sZwNfSHIjcAfwJYAkj2EwhCRJ93Hy3Tv7LkH3w6yhUFXvSvI54JHAf1a16F8CvL7r4rTvO/zgncCO4U9J+7s5h4Cq6iszbPufbsrR/ua0J97SdwmS5tHYz2iWJB34DAVJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKlxqoq9MDExwdTUFCtWrGDDhg19lyNJ88ZQ2AtTU1Ns37697zIkad45fCRJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSc0BdfPaU9789wtynENvvI2lwHdvvK3zY1787ld3un9JGmVPQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQcUHc0L5Sdyw65z09JOlAYCnvh9jUv6LsESeqEw0eSpMZQkCQ1hoIkqTEUJElNp6GQ5LgkVyfZmuQtM7z/xiRXJrk8yeeSPKrLeiRJs+ssFJIsBc4EXggcDZyU5OhpzS4F1lbVE4FPARu6qkeSNLcuewrHAlur6pqquhM4BzhhtEFVnV9V/zdc/QqwssN6JElz6DIUjgCuHVmfHG7bnVOA/+iwHknSHLq8eS0zbKsZGyYnA2uBZ+3m/fXAeoBVq1bNV32SpGm67ClMAkeOrK8ErpveKMnzgLcBx1fVT2baUVVtrKq1VbV2+fLlnRQrSeo2FC4C1iQ5Ksky4ERg02iDJMcAH2QQCNd3WIskaQydhUJV7QBOBc4DrgLOraorkpyR5Phhs3cDDwY+meSyJJt2sztJ0gLodEK8qtoMbJ627fSR5ed1eXxJ0p7xjmZJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqDAVJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJElNp6GQ5LgkVyfZmuQtM7z/U0n+afj+V5Os7rIeSdLsOguFJEuBM4EXAkcDJyU5elqzU4Cbq+oxwHuBP++qHknS3LrsKRwLbK2qa6rqTuAc4IRpbU4APjZc/hTw3CTpsCZJ0iy6DIUjgGtH1ieH22ZsU1U7gFuBh3dYkyRpFgd1uO+Z/uKvvWhDkvXA+uHqj5JcfT9rmw+HAzd2fZD8xW92fYj5sCDngj/a5zuRC3MegLzBc9Hs+4MLC3YuXv+eWd9+1Dj76DIUJoEjR9ZXAtftps1kkoOAhwA3Td9RVW0ENnZU515JsqWq1vZdx77AczHgebiX5+Je+9u56HL46CJgTZKjkiwDTgQ2TWuzCbjnT+GXAp+vql16CpKkhdFZT6GqdiQ5FTgPWAp8pKquSHIGsKWqNgEfBj6eZCuDHsKJXdUjSZpbl8NHVNVmYPO0baePLP8YeFmXNXRonxrO6pnnYsDzcC/Pxb32q3MRR2skSfdwmgtJUmMo7KG5pu5YLJJ8JMn1Sb7Zdy19S3JkkvOTXJXkiiS/13dNfUlycJKvJfn68Fy8s++a+pZkaZJLk/x737WMw1DYA2NO3bFYfBQ4ru8i9hE7gDdV1eOBpwGvW8T/L34CPKeqngQ8GTguydN6rqlvvwdc1XcR4zIU9sw4U3csClX1RWa4p2QxqqrvVdUlw+XbGPwCmH73/qJQAz8arj5g+Fq0Fy6TrAR+GfhQ37WMy1DYM+NM3aFFbDjT7zHAV/utpD/D4ZLLgOuBz1bVoj0XwPuACWBn34WMy1DYM2NNy6HFKcmDgU8Dv19VP+y7nr5U1d1V9WQGsxgcm+QJfdfUhyQvBq6vqov7rmVPGAp7ZpypO7QIJXkAg0D4x6r6TN/17Auq6hbgAhbvtad1wPFJvs1gqPk5Sf6h35LmZijsmXGm7tAiM5zu/cPAVVU1+5RkB7gky5M8dLj8QOB5wLf6raofVfXWqlpZVasZ/K74fFWd3HNZczIU9sBweu97pu64Cji3qq7ot6p+JDkb+G/gcUkmk5zSd009Wge8isFfgpcNXy/qu6iePBI4P8nlDP6I+mxV7RdfxdSAdzRLkhp7CpKkxlCQJDWGgiSpMRQkSY2hIElqDAUtCknuHvm66GXD6SjmY78PTvKBJP87nAnz4iSvnY99S33o9Mlr0j7kjuHUC3skydKqunuWJh8CrgHWVNXOJMuB396L/Uj7BHsKWrSSrE7ypSSXDF9PH25/9vD5CJ8AvjHcdvLwOQGXJfngcNK3RzOYOfftVbUToKpuqKo/n2U/b0zyzeHr90fq+OZIXaclecdw+YIk70vy5eFnjl2wE6RFyZ6CFosHDmfuBNhWVS9hMIvn86vqx0nWAGcDa4dtjgWeUFXbkjweeAWwrqruSnIW8ErgFuDr9wTCbozu5ynAbwE/z2Byxa8m+QJw8xy1H1JVT0/yTOAjwKKcYE4Lw1DQYjHT8NEDgPcneTJwN/DYkfe+VlXbhsvPBZ4CXDSY5ogHMgiUS0Z3luRtwMuAR1TVT8+wn2cA/1xVtw/bfwb4BeaeP+tsGDzDIslhSR46nGxOmneGghazPwC+DzyJwVDqj0feu31kOcDHquqtox9O8hjgSUmWVNXOqnoX8K4kP5plPzPZwX2Hcg+e9v70uWicm0ad8ZqCFrOHAN8bDv+8Cli6m3afA16a5BEASR6W5FFVtRXYAvzJ8FGtJDmY3f/y/yLwq0kelOQQ4CXAlxgE0yOSPDzJTwEvnva5Vwz3/Qzg1qq6dS//vdKc7CloMTsL+HSSlwHnc9+/6puqujLJ24H/TLIEuAt4HfAd4HeAdwNbk9wE3AH84W72c0mSjwJfG276UFVdCpDkDAZPa9vGrlNN35zky8BhzPDNJmk+OUuqtA9LcgFwWlVt6bsWLQ4OH0mSGnsKkqTGnoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktT8PxxVz3O2w6qwAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.barplot(x='FareGroup', y='Survived', data=train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Embarked\n",
    "\n",
    "泰坦尼克号行驶路线是从英国南安普敦出发，途经法国瑟堡奥克特维尔以及爱尔兰科夫，驶向美国纽约，乘客是从不同的港口登船的。Embarked 的取值有三种：\n",
    "\n",
    "- C (Cherbourg) 瑟堡（法国西北部港市）\n",
    "- Q (Queenstown) 昆斯敦（新西兰）\n",
    "- S (Southampton) 南安普敦（英国英格兰南部港市）\n",
    "\n",
    "感觉登船的位置和是否幸存没啥关系，但是统计了一下发现，不同地方登船的乘客，幸存率存在差异。我估计可能是登船港口和船舱位置有关。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x7fd13cc2eeb8>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAEttJREFUeJzt3X+w3Xdd5/HnK+mEUmh1tHcN0yQmU4JuhC4/LhG3rAUtmI7azkKFFnZKZ1gzzBjZWYRsWZiocdkfccVRCUgY0YoLoeCikc0alR8uVKu5hU7ZtITGtCU39Q4pLVBYSpv2vX+ck4+nl5t7T9L7zbk39/mYuXPP93s+53te7Znc1/n+TlUhSRLAslEHkCQtHJaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ154w6wKm68MILa+3ataOOIUmLyq233np/VY3NNW7RlcLatWuZmJgYdQxJWlSS3DvMODcfSZIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSs+hOXpMWoq1btzI1NcXKlSvZsWPHqONIp81SkObB1NQUR48eHXUM6Ulz85EkqbEUJEmNpSBJajothSSbkhxMcijJDScZ86okdyQ5kOSDXeaRJM2usx3NSZYDO4GXAZPA/iR7quqOgTHrgbcCl1bVg0n+WVd5JElz63JNYSNwqKoOV9UjwG7gqmljfh7YWVUPAlTVVzrMI0maQ5elcBFwZGB6sj9v0LOAZyW5OcktSTZ1mEeSNIcuz1PIDPNqhvdfD7wEWAV8Jsmzq+prT1hQshnYDLBmzZr5TypJArpdU5gEVg9MrwLum2HMn1bVo1V1N3CQXkk8QVXtqqrxqhofG5vzFqOSpNPUZSnsB9YnWZdkBXANsGfamD8BXgqQ5EJ6m5MOd5hJkjSLzkqhqo4DW4B9wJ3ATVV1IMn2JFf2h+0DvprkDuBTwFuq6qtdZZIkza7Tax9V1V5g77R52wYeF/Cm/o8kacQ8o1mS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpKbTy1xIT8aXtz9n1BGGdvyB7wPO4fgD9y6q3Gu2fWHUEbTAuKYgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkptNSSLIpycEkh5LcMMPz1yc5luS2/s+/7TKPJGl2nV06O8lyYCfwMmAS2J9kT1XdMW3oh6tqS1c5JEnD63JNYSNwqKoOV9UjwG7gqg7fT5L0JHVZChcBRwamJ/vzpntlktuTfDTJ6g7zSJLm0GUpZIZ5NW36z4C1VXUJ8FfAjTMuKNmcZCLJxLFjx+Y5piTphC5LYRIY/Oa/CrhvcEBVfbWqvtOffB/wgpkWVFW7qmq8qsbHxsY6CStp6dq6dSvXXXcdW7duHXWUkevyHs37gfVJ1gFHgWuA1wwOSPKMqvrH/uSVwJ0d5pGkGU1NTXH06NFRx1gQOiuFqjqeZAuwD1gOvL+qDiTZDkxU1R7gjUmuBI4DDwDXd5VHkjS3LtcUqKq9wN5p87YNPH4r8NYuM0iShucZzZKkxlKQJDWdbj6SlooLz30cON7/LS1eloI0D958yddGHUGaF24+kiQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlS02kpJNmU5GCSQ0lumGXc1UkqyXiXeSRJs+usFJIsB3YCVwAbgGuTbJhh3PnAG4G/6yqLJGk4Xa4pbAQOVdXhqnoE2A1cNcO4XwN2AA93mEWSNIQuS+Ei4MjA9GR/XpPkecDqqvr4bAtKsjnJRJKJY8eOzX9SSRLQbSlkhnnVnkyWAb8J/NJcC6qqXVU1XlXjY2Nj8xhRkjSoy1KYBFYPTK8C7huYPh94NvDpJPcALwL2uLNZkkany1LYD6xPsi7JCuAaYM+JJ6vq61V1YVWtraq1wC3AlVU10WEmSdIszulqwVV1PMkWYB+wHHh/VR1Ish2YqKo9sy9B0mJ16e9cOuoIp2TF11awjGUc+dqRRZX95l+8ed6XOWspJHmIgf0A01XVBbO9vqr2Anunzdt2krEvmW1ZkqTuzVoKVXU+QP/b/RTwAXo7kF9Lb5+AJOksMuw+hZ+qqndX1UNV9Y2qeg/wyi6DSZLOvGH3KTyW5LX0TkAr4Frgsc5SLVFbt25lamqKlStXsmPHjlHHkbQEDVsKrwF+q/9TwM39eZpHU1NTHD16dNQxJC1hQ5VCVd3DzJeokCSdRYbap5DkWUk+keT/9qcvSfL2bqNJks60YXc0vw94K/AoQFXdTu9kNEnSWWTYUjivqv5+2rzj8x1GkjRaw5bC/Ukupn8iW5KrgX/sLJUkaSSGPfroF4BdwA8nOQrcTe8ENknSWWTYUri3qi5P8jRgWVU91GUoSdJoDLv56O4ku+hd3vqbHeaRJI3QsKXwQ8Bf0duMdHeSdyV5cXexJEmjMFQpVNW3q+qmqnoF8DzgAuCvO00mSTrjhr7JTpLLkrwb+BxwLvCqzlJJkkZiqB3NSe4GbgNuAt5SVd/qNNU8ecFb/nDUEU7J+fc/xHLgy/c/tKiy3/rr1406gqR5MuzRR/+iqr7RaRJJ0sjNdee1rVW1A3hHku+6A1tVvbGzZJKkM26uNYU7+78nug4iSRq9uW7H+Wf9h7dX1efPQB5J0ggNe/TRO5N8McmvJfmRThNJkkZm2PMUXgq8BDgG7EryBe+nIElnn6HPU6iqqar6beAN9A5P3dZZKknSSAx757V/nuRX+ndeexfwN8CqTpNJks64YdcUfh94EHh5VV1WVe+pqq/M9aIkm5IcTHIoyQ0zPP+G/qao25J8NsmGU8wvSZpHc5ZCkuXAP1TVb1XVfcMuuP+6ncAVwAbg2hn+6H+wqp5TVc8FdgDvHD66JGm+zVkKVfUY8P1JVpzisjcCh6rqcFU9AuwGrpq27MGzpJ9G/85ukqTRGPomO8DNSfYA7bpHVTXbN/uLgCMD05PAj04flOQXgDcBK4CfmGlBSTYDmwHWrFkzZGRJ0qkadp/CfcDH++PPH/iZTWaYN9OlMnZW1cXAfwBmPMy1qnZV1XhVjY+NjQ0ZWZJ0qoZaU6iqXz2NZU8CqwemV9Erl5PZDbznNN5HkjRPhr109qeY+Vv+jJt7+vYD65OsA44C1wCvmbbc9VV1V3/yp4G7kCSNzLD7FN488Phc4JXA8dleUFXHk2wB9gHLgfdX1YEk24GJqtoDbElyOfAovUNeX3eq/wFnk8dXPO0JvyXpTBt289Gt02bdnGTO23FW1V5g77R52wYe/7th3n+p+Nb6l486gqQlbtjNR983MLkMGAdWdpJIks6wOq94nMep8zwqftjNR7fyT/sUjgP3AK/vIpAknWmPXvroqCMsGHPdee2FwJGqWteffh29/Qn3AHd0nk6SdEbNdZ7Ce4FHAJL8OPBfgBuBrwO7uo0mSTrT5tp8tLyqHug/fjWwq6r+GPjjJLd1G02SdKbNtaawPMmJ4vhJ4JMDzw27P0KStEjM9Yf9Q8BfJ7kf+DbwGYAkz6S3CUmSdBaZtRSq6h1JPgE8A/iLqjpxBNIy4Be7DidJOrPm3ARUVbfMMO9L3cSRJI3S0PdoliSd/SwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKnptBSSbEpyMMmhJDfM8PybktyR5PYkn0jyg13mkSTNrrNSSLIc2AlcAWwArk2yYdqwzwPjVXUJ8FFgR1d5JElz63JNYSNwqKoOV9UjwG7gqsEBVfWpqvp//clbgFUd5pEkzaHLUrgIODIwPdmfdzKvB/53h3kkSXOY8x7NT0JmmFczDkz+DTAOXHaS5zcDmwHWrFkzX/kkSdN0uaYwCawemF4F3Dd9UJLLgbcBV1bVd2ZaUFXtqqrxqhofGxvrJKwkqdtS2A+sT7IuyQrgGmDP4IAkzwPeS68QvtJhFknSEDorhao6DmwB9gF3AjdV1YEk25Nc2R/268DTgY8kuS3JnpMsTpJ0BnS5T4Gq2gvsnTZv28Djy7t8f0nSqfGMZklSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWo6LYUkm5IcTHIoyQ0zPP/jST6X5HiSq7vMIkmaW2elkGQ5sBO4AtgAXJtkw7RhXwauBz7YVQ5J0vDO6XDZG4FDVXUYIMlu4CrgjhMDquqe/nOPd5hDkjSkLjcfXQQcGZie7M+TJC1QXZZCZphXp7WgZHOSiSQTx44de5KxJEkn02UpTAKrB6ZXAfedzoKqaldVjVfV+NjY2LyEkyR9ty5LYT+wPsm6JCuAa4A9Hb6fJOlJ6qwUquo4sAXYB9wJ3FRVB5JsT3IlQJIXJpkEfg54b5IDXeWRJM2ty6OPqKq9wN5p87YNPN5Pb7OSJGkB8IxmSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJajothSSbkhxMcijJDTM8/5QkH+4//3dJ1naZR5I0u85KIclyYCdwBbABuDbJhmnDXg88WFXPBH4T+G9d5ZEkza3LNYWNwKGqOlxVjwC7gaumjbkKuLH/+KPATyZJh5kkSbPoshQuAo4MTE/25804pqqOA18Hvr/DTJKkWZzT4bJn+sZfpzGGJJuBzf3JbyY5+CSzLWQXAvePOsSpyH9/3agjLBSL7rPjl10xH7DoPr+88ZQ+vx8cZlCXpTAJrB6YXgXcd5Ixk0nOAb4HeGD6gqpqF7Cro5wLSpKJqhofdQ6dOj+7xc3Pr6fLzUf7gfVJ1iVZAVwD7Jk2Zg9w4mvm1cAnq+q71hQkSWdGZ2sKVXU8yRZgH7AceH9VHUiyHZioqj3A7wEfSHKI3hrCNV3lkSTNLX4xX1iSbO5vLtMi42e3uPn59VgKkqTGy1xIkhpLYYFI8rYkB5LcnuS2JD866kwaXpKVSXYn+YckdyTZm+RZo86luSVZleRPk9yV5HCSdyV5yqhzjYqlsAAk+THgZ4DnV9UlwOU88cQ/LWD9s/A/Bny6qi6uqg3AfwR+YLTJNJf+Z/c/gT+pqvXAeuCpwI6RBhuhLs9T0PCeAdxfVd8BqKpFdQKNeCnwaFX97okZVXXbCPNoeD8BPFxVvw9QVY8l+ffAvUneVlXfHG28M881hYXhL4DVSb6U5N1JLht1IJ2SZwO3jjqETsuPMO2zq6pvAPcAzxxFoFGzFBaA/reRF9C7lMcx4MNJrh9pKGlpCDNcWoeZL8GzJFgKC0RVPVZVn66qXwa2AK8cdSYN7QC9UtficwB4wqUtklxAb3/Q2XyNtZOyFBaAJD+UZP3ArOcC944qj07ZJ4GnJPn5EzOSvNDNgIvCJ4DzklwH7T4wvwG8q6q+PdJkI2IpLAxPB27sH8p4O72bEv3KaCNpWP3rdf1r4GX9Q1IP0Pv8pl8AUgvMwGd3dZK7gK8Cj1fVO0abbHQ8o1mS+pL8S+BDwCuqakkePGApSJIaNx9JkhpLQZLUWAqSpMZSkCQ1loKWjCSP9a9Ae+LnhlN47UuSfPxJvv+nk5zWPYDn4/2lYXhBPC0l366q547ijfsnRUkLnmsKWvKS3JPkPyf52yQTSZ6fZF//RLQ3DAy9IMnH+icZ/m6SZf3Xv6f/ugNJfnXacrcl+SzwcwPzlyW5Mcl/6k+/vP/en0vykSRP78/flOSL/de/4oz8z9CSZyloKXnqtM1Hrx547khV/RjwGeAPgKuBFwHbB8ZsBH4JeA5wMf/0h/ptVTUOXAJcluSSgdc8XFUvrqrd/elzgP8BfKmq3p7kQuDtwOVV9XxgAnhTknOB9wE/C/wrYOU8/T+QZuXmIy0ls20+2tP//QXg6VX1EPBQkoeTfG//ub+vqsMAST4EvBj4KPCqJJvp/Xt6Br3LlNzef82Hp73Pe4GbBi6j8KL++Jt793thBfC3wA8Dd1fVXf33+yN6V9GVOmUpSD3f6f9+fODxiekT/06mn/5fSdYBbwZeWFUPJvkD4NyBMd+a9pq/AV6a5Deq6mF6l2j+y6q6dnBQkufO8H5S59x8JA1vY5J1/X0JrwY+C1xA7w//15P8AHDFHMv4PWAv8JEk5wC3AJcmeSZAkvP693b+IrAuycX9110749KkeeaagpaSpyYZvE3mn1fV0Iel0tus81/p7VP4P8DHqurxJJ+nd13+w8DNcy2kqt6Z5HuADwCvBa4HPjRws/i3V9WX+puk/leS++kV0LNPIat0WrwgniSpcfORJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1/x8I1vrLYq2XTAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.barplot(x='Embarked',y='Survived',data=train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SibSp 和 Parch\n",
    "\n",
    "兄弟姐妹数量（SibSp）和父母与孩子数量（Parch）可以统一称为家庭成员数量。家庭规模的大小，可能影响是否幸存。另外是否一个人独自乘船，也可能和幸存与否有关。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "train['FamilySize'] = train[['SibSp', 'Parch']].sum(axis=1) + 1 # 加上自己\n",
    "train['Alone'] = train['FamilySize'] == 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x7fd138b11588>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAtQAAAEKCAYAAAAy8cIyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAHvRJREFUeJzt3Xu0XnV95/H3hyBFELVK2jiQGFojmiqCRrSlywu3Qi/QTlFBqDJDm+VaglaLGRwdFmWWM52jo9WKrfGurVJAnYk2Fa23qiOacBEkSI1cc+BIKIJAqRDynT+eHXw4OUmec/Z5Lifn/VrrrPPsvX9772/04ZdPfvu3905VIUmSJGlm9hh2AZIkSdJcZqCWJEmSWjBQS5IkSS0YqCVJkqQWDNSSJElSCwZqSZIkqQUDtSRJktSCgVqSJElqwUAtSZIktbDnsAuYrv3337+WLl067DIkadouv/zyO6tq4bDrGCT7bElzWa/99pwL1EuXLmX9+vXDLkOSpi3JzcOuYdDssyXNZb322075kCRJklowUEuSJEktGKglSZKkFgzUkiRJUgsGakmSJKkFA7UkSZLUgoFakiRJasFALUmSJLUw517sot3PqlWrmJiYYNGiRYyNjQ27HEkaCvtCae4yUGvoJiYmGB8fH3YZkjRU9oXS3OWUD0mSJKkFA7UkSZLUgoFakiRJasFALUmSJLVgoJYkSZJaMFBLkiRJLRioJUmSpBYM1JIkSVILBmpJkiSpBQO1JEmS1EJfA3WS45Jcn2RjknOm2L4kyVeTXJnk6iS/3c96JEmSpNnWt0CdZAFwAXA8sBw4JcnySc3eClxUVYcBJwPv61c9kiRJUj/0c4T6cGBjVd1QVQ8CFwInTmpTwOObz08AbutjPZIkSdKs27OPxz4AuLVreRPwgkltzgO+mOQsYF/g6D7WI0mSJM26fo5QZ4p1NWn5FOCjVXUg8NvAJ5JsV1OSlUnWJ1m/efPmPpQqSZIkzUw/A/UmYHHX8oFsP6XjDOAigKr6NrA3sP/kA1XV6qpaUVUrFi5c2KdyJUmSpOnrZ6BeByxLclCSvejcdLhmUptbgKMAkjyTTqB2CFqShqCHJzOdnmRzkquanz8eRp2SNGr6Noe6qrYkORO4FFgAfLiqrk1yPrC+qtYAfwZ8IMkb6EwHOb2qJk8LkST1WdeTmY6hc4VxXZI1VbVhUtO/r6ozB16gJI2wft6USFWtBdZOWndu1+cNwBH9rEGS1JNHnswEkGTbk5kmB+qBet6bPj7M0w/UfnfeywLgljvvnVd/7svf/qphlyC15psSJUkw9ZOZDpii3R82L+K6JMniKbZL0rxjoJYkQW9PZvocsLSqDgH+CfjYlAfyyUyS5hkDtSQJengyU1X9a1X9rFn8APC8qQ7kk5kkzTcGakkS9PBkpiRP6Vo8AbhugPVJ0sjq602JkqS5occnM70uyQnAFuAu4PShFSxJI8RALUkCenoy05uBNw+6LkkadU75kCRJklowUEuSJEktGKglSZKkFgzUkiRJUgsGakmSJKkFA7UkSZLUgoFakiRJasHnUGvWff1FL55W+wf2XAAJD2zaNO19X/zPX59We0mSpNnmCLUkSZLUgoFakiRJasEpH5IkjYCte+37qN+S5g4DtSRJI+D+ZccOuwRJM+SUD0mSJKkFA7UkSZLUgoFakiRJasFALUmSJLVgoJYkSZJa8Ckf89yqVauYmJhg0aJFjI2NDbscSZKkOcdAPc9NTEwwPj4+7DIkSZLmLKd8SJIkSS0YqCVJkqQWDNSSJElSCwZqSZIkqQUDtSRJktSCgVqSJElqwUAtSZIktWCgliRJklowUEuSJEktGKglSQAkOS7J9Uk2JjlnJ+1OSlJJVgyyPkkaVb56fEhWrVrFxMQEixYtYmxsbNjlSJrnkiwALgCOATYB65KsqaoNk9rtB7wO+M7gq5Sk0eQI9ZBMTEwwPj7OxMTEsEuRJIDDgY1VdUNVPQhcCJw4Rbv/DowB/z7I4iRplBmoJUkABwC3di1vatY9IslhwOKq+vwgC5OkUWegliQBZIp19cjGZA/gXcCf7fJAycok65Os37x58yyWKEmjqa+BupcbXJK8PMmGJNcm+WQ/65Ek7dAmYHHX8oHAbV3L+wHPAr6W5CbghcCaqW5MrKrVVbWiqlYsXLiwjyVL0mjo202JvdzgkmQZ8GbgiKr6SZJf6lc9kqSdWgcsS3IQMA6cDLxy28aqugfYf9tykq8BZ1fV+gHXKUkjp58j1L3c4PInwAVV9ROAqrqjj/VIknagqrYAZwKXAtcBF1XVtUnOT3LCcKuTpNHWz8fmTXWDywsmtXk6QJJvAQuA86rqC32sSZK0A1W1Flg7ad25O2j7kkHUJElzQT8D9U5vcOk6/zLgJXTm630jybOq6u5HHShZCawEWLJkyexXKkmSJM1QP6d87OoGl21t/m9VPVRVNwLX0wnYj+INLpIkSRpV/QzUj9zgkmQvOje4rJnU5v8ALwVIsj+dKSA39LEmSZIkaVb1bcpHVW1Jsu0GlwXAh7fd4AKsr6o1zbZjk2wAHgbeVFX/2q+aJPVu1apVTExMsGjRIsbGxoZdjiRJI6ufc6h3eYNLVRXwxuZH0giZmJhgfHx82GVIkjTy+hqoNVhH/NUR095nr7v3Yg/24Na7b53W/t8661vTPpckSdLuyFePS5IkSS0YqCVJkqQWDNSSJElSCwZqSZIkqQUDtSRJktSCT/mYBbec/+xp77PlricBe7Llrpuntf+Sc6+Z9rkkSZLUPwZqDd0Tqx71W5IkaS4xUGvoTnt467BLkCRJmjHnUEuSJEktOEItSZI0D6xatYqJiQkWLVrE2NjYsMvZrew0UCe5F9jhxNaqevysVyRJmhH7bEk7MzExwfj4+LDL2C3tNFBX1X4ASc4HJoBPAAFOBfbre3WSpJ7ZZ0vScPQ6h/q3qup9VXVvVf20qv4a+MN+FiZJmjH7bEkaoF4D9cNJTk2yIMkeSU4FHu5nYZKkGbPPlqQB6jVQvxJ4OfDj5udlzTpJ0uixz5akAerpKR9VdRNwYn9LkSTNBvtsSRqsnkaokzw9yZeTfL9ZPiTJW/tbmiRpJuyzJWmwep3y8QHgzcBDAFV1NXByv4qSJLViny1JA9RroN6nqr47ad2W2S5GkjQr7LMlaYB6DdR3JvlVmhcGJDkJuL1vVUmS2rDPlqQB6vXV468FVgPPSDIO3EjnRQGSpNFjny1JA9RroL65qo5Osi+wR1Xd28+iJEmt2GdL0gD1OuXjxiSrgRcC9/WxHklSe/bZkjRAvQbqg4F/onMZ8cYk703ym/0rS4NS+xRb991K7VPDLkXS7JlRn53kuCTXJ9mY5Jwptr8myTVJrkryzSTL+1C7JM05PQXqqnqgqi6qqv8IHAY8Hvh6XyvTQDx0xEM8eMyDPHTEQ8MuRdIsmUmfnWQBcAFwPLAcOGWKwPzJqnp2VR0KjAHvnP3qJWnu6XUONUleDLyCTme7js5rbTVD+++9FdjS/Jak2TWDPvtwYGNV3dDsfyGdty1u2Nagqn7a1X5fmqeISHPVLec/e9glDNSWu54E7MmWu26eN3/2JedeM5Dz9BSok9wIXAVcBLypqu7va1XzwNmH3D3sEiTtpmbYZx8A3Nq1vAl4wRTHfi3wRmAv4MgdnH8lsBJgyZIl06pdkuaiXkeonzNpZEKSNLpm0mdninXbjUBX1QXABUleCbwVePUUbVbTeWwfK1ascBRb0m5vp4E6yaqqGgPelmSqjvV1fatMkjQtLfvsTcDiruUDgdt20v5C4K9nVKgk7WZ2NUJ9XfN7fb8LkSS11qbPXgcsS3IQMA6cDLyyu0GSZVX1w2bxd4AfIknaeaCuqs81H6+uqisHUI8kaYba9NlVtSXJmcClwALgw1V1bZLzgfVVtQY4M8nRwEPAT5hiuockzUe9zqF+Z5KnABcDF1bVtX2sSZLUzoz67KpaC6ydtO7crs+vn9UqJWk30etzqF8KvATYDKxuHuz/1n4WJkmaGftsSRqsXt+USFVNVNV7gNfQeRzTubvYRZI0JPbZkjQ4PQXqJM9Mcl6S7wPvBf4fnTvAJUkjxj5bkgar1znUHwE+BRxbVTt7jJIkafjssyVpgHYZqJMsAH5UVe8eQD2SpBbssyVp8HY55aOqHgaenGSvAdQjSWrBPluSBq/XKR83A99Ksga4f9vKqnrnznZKchzwbjrPNP1gVf3FDtqdROfxTs+vKl8iI0ntzKjPliTNTK+B+rbmZw9gv152aC47XgAcQ+eVtuuSrKmqDZPa7Qe8DvhOr0VLknZq2n22JGnmegrUVfXnMzj24cDGqroBIMmFwInAhknt/jswBpw9g3NIkiaZYZ8taTe3/95bgS3Nb82mngJ1kq8CNXl9VR25k90OAG7tWt4EvGDScQ8DFlfV55MYqDVUq1atYmJigkWLFjE2NjbscqQZm2GfLWk3d/Yhdw+7hN1Wr1M+usPu3sAfAlt2sU+mWPdIB59kD+BdwOm7OnmSlcBKgCVLluyquTQjExMTjI+PD7sMaTbMpM+WJM1Qr1M+Lp+06ltJvr6L3TYBi7uWD6Qzp2+b/YBnAV9LArAIWJPkhMk3JlbVamA1wIoVK7YbdZEk/dwM+2xJ0gz1OuXjSV2LewAr6ATgnVkHLEtyEDAOnAy8ctvGqroH2L/rHF8DzvYpH5LUzgz7bEnSDPU65eNyfj5dYwtwE3DGznaoqi1JzgQupfPYvA9X1bVJzgfWV9WamZUsSdqFaffZkqSZ22mgTvJ84NaqOqhZfjWduXg3sf3TOrZTVWuBtZPWnbuDti/pqWJJ0pTa9tmSpJnZ1ZsS3w88CJDkRcD/BD4G3EMzp1mSNDLssyVpCHY15WNBVd3VfH4FsLqqPg18OslV/S1NkjRN9tmSNAS7GqFekGRb6D4K+ErXtl7nX0uSBsM+W5KGYFcd7KeArye5E3gA+AZAkqfRuYQoSRod9tmSNAQ7DdRV9bYkXwaeAnyxqrbdNb4HcFa/i5Mk9c4+W5KGY5eXAKvqsinW/Ut/ypEktWGfLUmDt6s51JIkSZJ2Yt7dpLJq1SomJiZYtGgRY2Njwy5HkiRJc9y8C9QTExOMj48PuwxJkiTtJpzyIUmSJLVgoJYkSZJamHdTPqT56Lq3fWXXjSZ58K4HHvk9nf2f+ZYjp30uSZLmMkeoJUmSpBYM1JIkAJIcl+T6JBuTnDPF9jcm2ZDk6iRfTvLUYdQpSaPGQC1JIskC4ALgeGA5cEqS5ZOaXQmsqKpDgEsAnz0qSRioJUkdhwMbq+qGqnoQuBA4sbtBVX21qv6tWbwMOHDANUrSSDJQS5IADgBu7Vre1KzbkTOAf5xqQ5KVSdYnWb958+ZZLFGSRpOBWpIEkCnW1ZQNk9OAFcDbp9peVaurakVVrVi4cOEslihJo2lOPzbveW/6+LT32e/Oe1kA3HLnvdPe//K3v2ra55OkOWITsLhr+UDgtsmNkhwNvAV4cVX9bEC1SdJIc4RakgSwDliW5KAkewEnA2u6GyQ5DHg/cEJV3TGEGiVpJBmoJUlU1RbgTOBS4Drgoqq6Nsn5SU5omr0deBxwcZKrkqzZweEkaV6Z01M+JEmzp6rWAmsnrTu36/PRAy9KkuYAR6glSZKkFgzUkiRJUgtO+ZBGzKpVq5iYmGDRokWMjfkiOkmSRp2BWhoxExMTjI+PD7sMSZLUI6d8SJIkSS0YqCVJkqQW5t2Uj6177fuo35IkSVIb8y5Q37/s2GGXIEmSpN2IUz4kSZKkFgzUkiRJUgsGakmSJKkFA7UkSZLUgoFakiRJasFALUmSJLVgoJYkSZJaMFBLkiRJLRioJUmSpBb6GqiTHJfk+iQbk5wzxfY3JtmQ5OokX07y1H7WI0mSJM22vgXqJAuAC4DjgeXAKUmWT2p2JbCiqg4BLgHG+lWPJEmS1A/9HKE+HNhYVTdU1YPAhcCJ3Q2q6qtV9W/N4mXAgX2sR5IkSZp1/QzUBwC3di1vatbtyBnAP/axHkmSJGnW7dnHY2eKdTVlw+Q0YAXw4h1sXwmsBFiyZMls1afd3Hv/7HPTan/3nfc/8nu6+575v39vWu0lSdLuo58j1JuAxV3LBwK3TW6U5GjgLcAJVfWzqQ5UVaurakVVrVi4cGFfipUkSZJmop+Beh2wLMlBSfYCTgbWdDdIchjwfjph+o4+1iJJkiT1Rd8CdVVtAc4ELgWuAy6qqmuTnJ/khKbZ24HHARcnuSrJmh0cTpIkSRpJ/ZxDTVWtBdZOWndu1+ej+3l+SZIkqd98U6IkSZLUgoFakiRJasFALUkCIMlxSa5PsjHJOVNsf1GSK5JsSXLSMGqUpFFkoJYkkWQBcAFwPLAcOCXJ8knNbgFOBz452OokabT19aZESdKccTiwsapuAEhyIXAisGFbg6q6qdm2dRgFStKocoRakgRwAHBr1/KmZp0kaRcM1JIkgEyxrmZ0oGRlkvVJ1m/evLllWZI0+gzUkiTojEgv7lo+ELhtJgeqqtVVtaKqVixcuHBWipOkUWagliQBrAOWJTkoyV7AyYBvr5WkHhioJUlU1RbgTOBS4Drgoqq6Nsn5SU4ASPL8JJuAlwHvT3Lt8CqWpNHhUz6kPnvbadN7XO9dd9zT+T1x+7T3fcvfXjKt9lK3qloLrJ207tyuz+voTAWRJHVxhFqSJElqwUAtSZIktWCgliRJklowUEuSJEktGKglSZKkFgzUkiRJUgsGakmSJKkFA7UkSZLUgoFakiRJasFALUmSJLVgoJYkSZJaMFBLkiRJLRioJUmSpBb2HHYBkkbTk/d+wqN+S5KkqRmoJU3pzMNeOewSJEmaE5zyIUmSJLVgoJYkSZJaMFBLkiRJLRioJUmSpBYM1JIkSVILBmpJkiSpBQO1JEmS1IKBWpIkSWrBQC1JkiS1YKCWJEmSWjBQS5IkSS0YqCVJkqQWDNSSJElSC30N1EmOS3J9ko1Jzpli+y8k+ftm+3eSLO1nPZKkHbPPlqSZ6VugTrIAuAA4HlgOnJJk+aRmZwA/qaqnAe8C/le/6pEk7Zh9tiTNXD9HqA8HNlbVDVX1IHAhcOKkNicCH2s+XwIclSR9rEmSNDX7bEmaoX4G6gOAW7uWNzXrpmxTVVuAe4An97EmSdLU7LMlaYZSVf05cPIy4Leq6o+b5T8CDq+qs7raXNu02dQs/6hp86+TjrUSWNksHgxc37K8/YE7Wx5jNljHaNUA1jGZdTxa2zqeWlULZ6uY2TTiffZ8Mirfde2+/I5NT0/99p59LGATsLhr+UDgth202ZRkT+AJwF2TD1RVq4HVs1VYkvVVtWK2jmcdu0cN1mEdc6WOPhnZPns+2c2/YxoBfsf6o59TPtYBy5IclGQv4GRgzaQ2a4BXN59PAr5S/RoylyTtjH22JM1Q30aoq2pLkjOBS4EFwIer6tok5wPrq2oN8CHgE0k20hnlOLlf9UiSdsw+W5Jmrp9TPqiqtcDaSevO7fr878DL+lnDDozKpUjr+LlRqAGsYzLreLRRqaMvRrjPnk926++YRoLfsT7o202JkiRJ0nzgq8clSZKkFuZVoE7y4SR3JPn+EGtYnOSrSa5Lcm2S1w+pjr2TfDfJ95o6/nwYdXTVsyDJlUk+P8QabkpyTZKrkqwfYh1PTHJJkh8035NfH0INBzf/O2z7+WmSPx1CHW9ovp/fT/KpJHsP6Lzb9RVJXtbUsjWJd8irJ0kenvTf0tKdtF06zL+fNDcleXLX92siyXjX8l7Drm++mFdTPpK8CLgP+HhVPWtINTwFeEpVXZFkP+By4PerasOA6wiwb1Xdl+QxwDeB11fVZYOso6ueNwIrgMdX1e8OqYabgBVVNdTncyb5GPCNqvpg0xnuU1V3D7GeBcA48IKqunmA5z2AzvdyeVU9kOQiYG1VfXQA596ur0jyTGAr8H7g7Koa2j+6NHckua+qHtdj26XA54f195PmviTnAfdV1TsmrQ+dzLd1KIXNA/NqhLqq/pkpnpk64Bpur6orms/3Atex/dvIBlFHVdV9zeJjmp+h/OsqyYHA7wAfHMb5R0mSxwMvovM0BarqwWGG6cZRwI8GGaa77Ak8tnnm8T5s/1zkvpiqr6iq66rKF5SotWYk+htJrmh+fmOKNr/WXEW8KsnVSZY160/rWv/+5h+80naSPK25uvc3wBXA4iR3d20/OckHm8+/nOQzSdY3368XDqvuuWpeBepR04xGHAZ8Z0jnX5DkKuAO4EtVNZQ6gL8EVtEZ/RumAr6Y5PJ03vQ2DL8CbAY+0kyB+WCSfYdUyzYnA58a9Emrahx4B3ALcDtwT1V9cdB1SC09tuvy+2ebdXcAx1TVc4FXAO+ZYr/XAO+uqkPpXL3b1FwleQVwRLP+YeDU/v8RNIctBz5UVYfRudK4I+8BxpoXvrwcB7imra+PzdOOJXkc8GngT6vqp8OooaoeBg5N8kTgs0meVVUDnb+X5HeBO6rq8iQvGeS5p3BEVd2W5JeALyX5QTNSOUh7As8Fzqqq7yR5N3AO8N8GXAcAzZSTE4A3D+HcvwicCBwE3A1cnOS0qvrbQdcitfBAE367PQZ4b5JtofjpU+z3beAtzRW8z1TVD5McBTwPWNe5gs9j6YRzaUd+VFXremh3NHBw870C+MUkj62qB/pX2u7FEeohaOYsfxr4u6r6zLDraaYUfA04bginPwI4oZm/fCFwZJKhBKaquq35fQfwWeDwIZSxCdjUdbXgEjoBe1iOB66oqh8P4dxHAzdW1eaqegj4DLDdpXFpDnoD8GPgOXRGn7e7cayqPknnH7MPAJcmORII8LGqOrT5Obiqzhtc2ZqD7u/6vJXOd2ib7pu8Axze9d06wDA9PQbqAWtuDPgQcF1VvXOIdSxsRqZJ8lg64eUHg66jqt5cVQdW1VI6Uwu+UlWnDbqOJPs2N4nSTLE4Fhj43fZVNQHcmuTgZtVRwEBvWJ3kFIYw3aNxC/DCJPs0/90cReeeA2muewJwe3OD2B/ReTPloyT5FeCGqnoPnVe+HwJ8GTipuYpGkicleergytZc1nzffpJkWZI9gD/o2vxPwGu3LTRXTzQN8ypQJ/kUnctoByfZlOSMIZRxBJ0O9MiueXW/PYQ6ngJ8NcnVwDo6c6iH9si6EfDLwDeTfA/4LvAPVfWFIdVyFvB3zf83hwL/YxhFJNkHOIbOyPDANaP0l9C5meYaOv3VQN7wNVVfkeQPkmwCfh34hySXDqIW7ZbeB7w6yWV0pnvcP0WbVwDfb+5zeQadJ85sAN5K516Pq4Ev0enLpV79F+ALdP5xtqlr/WuBI5obYDcAfzKM4uayefXYPEmSJGm2zasRakmSJGm2GaglSZKkFgzUkiRJUgsGakmSJKkFA7UkSZLUgoFac0aSh7seNXhV8+r2tsd8TZJXNZ8/muSkXbT/z0muaR4t9P0kJzbrz09ydNt6JGk+ax5PWUme0SwvTTLwdwJI0+WrxzWXTPUK31aq6m96bdu8AvgtwHOr6p7m9fELm+OcO5t1SdI8dQrwTTov+jpvuKVIvXOEWnNaM3rxjSRXND+/0ax/SZKvJ7koyb8k+Yskpyb5bjPC/KtNu/OSnD3pmEcl+WzX8jFJPgP8EnAvcB9AVd1XVTc2bT6a5KQkK7pG0K9JUs32X03yhSSXN/U+YyD/A0nSHNEMUhwBnEEnUE/evneSjzR965VJXtqsPz3JZ5o+9odJxrr2OTbJt5u/Hy5uziHNOgO15pLHdoXVbYH3DuCYqnounTeLvaer/XOA1wPPpvN2yqdX1eHAB+m8jXBHvgI8M8nCZvk/AR8Bvgf8GLix6dR/b/KOVbW+qg5tRtK/ALyj2bQaOKuqngecTedNaZKkn/t94AtV9S/AXUmeO2n7awGq6tl0RrI/lmTvZtuhdP4OeDbwiiSLk+xP582SRzd/R6wH3jiAP4fmIad8aC6ZasrHY4D3JjkUeJjOa3y3WVdVtwMk+RHwxWb9NcBLd3SSqqoknwBOS/IROq+aflVVPZzkOOD5wFHAu5I8r6rOm3yMJC8Hngsc24yI/AZwcZJtTX5hGn9uSZoPTgH+svl8YbN8Qdf23wT+CqCqfpDkZn7e53+5qu4BaF6d/VTgicBy4FtN37sX8O0+/xk0TxmoNde9gc6o8XPoXHH5965tP+v6vLVreSu7/u5/BPhcc7yLq2oLdMI28F3gu0m+1LQ7r3vHJL8G/DnwoiaE7wHcPdvzvyVpd5HkycCRwLOaqXILgOLRV/My1b6N7v7+YTp9fIAvVdUps1yutB2nfGiuewJwe1VtpTOtY8FsHLSqbgNuo3O58KMASf7DpEuQhwI3d++X5Al0RlZeVVWbm2P9lM40kZc1bZLkObNRpyTtJk4CPl5VT62qpVW1GLgROLCrzT8DpwIkeTqwBLh+J8e8DDgiydOaffZp9pNmnYFac937gFcnuYzOpb/7Z/HYfwfcWlUbmuXHAO9I8oMkV9GZr/f6Sfv8Pp1LjR/YNt+7WX8qcEaS7wHXAifOYp2SNNedAnx20rpPA/+1a/l9wIIk1wB/D5xeVT9jB5pBjdOBTyW5mk7A9oZw9UU6V7AlTZbkvcCVVfWhYdciSZJGl4FamkKSy+mMdh+zsxEQSZIkA7UkSZLUgnOoJUmSpBYM1JIkSVILBmpJkiSpBQO1JEmS1IKBWpIkSWrBQC1JkiS18P8BX1kO9kQDNIkAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 864x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(figsize=(12,4))\n",
    "plt.subplot(121)\n",
    "sns.barplot(x='FamilySize',y='Survived',data=train)\n",
    "plt.subplot(122)\n",
    "sns.barplot(x='Alone',y='Survived',data=train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "家庭成员数量和幸存率有关，独自一人乘船幸存率较低。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Name\n",
    "\n",
    "姓名看似和幸存没有关系，但这里 Name 并不只是乘客的名字，还包含了头衔。比如 Mrs. Miss. 等，这能体现乘客的某些特征。这里可以将乘客的头衔抽取出来。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0                              Braund, Mr. Owen Harris\n",
       "1    Cumings, Mrs. John Bradley (Florence Briggs Th...\n",
       "2                               Heikkinen, Miss. Laina\n",
       "3         Futrelle, Mrs. Jacques Heath (Lily May Peel)\n",
       "4                             Allen, Mr. William Henry\n",
       "Name: Name, dtype: object"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train['Name'].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "\n",
    "def get_title(name):\n",
    "    r_title = re.search(' ([A-Za-z]+)\\.', name)\n",
    "    if not r_title:\n",
    "        return \"\"\n",
    "    \n",
    "    return r_title.group(1)\n",
    "\n",
    "train['Title'] = train['Name'].apply(get_title)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Mr          517\n",
       "Miss        182\n",
       "Mrs         125\n",
       "Master       40\n",
       "Dr            7\n",
       "Rev           6\n",
       "Col           2\n",
       "Mlle          2\n",
       "Major         2\n",
       "Capt          1\n",
       "Don           1\n",
       "Sir           1\n",
       "Ms            1\n",
       "Jonkheer      1\n",
       "Countess      1\n",
       "Lady          1\n",
       "Mme           1\n",
       "Name: Title, dtype: int64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train['Title'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以把出现次数较少的头衔归为一个类别："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Mr        517\n",
       "Miss      182\n",
       "Mrs       125\n",
       "Master     40\n",
       "Rare       27\n",
       "Name: Title, dtype: int64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rare_mask = train['Title'].isin(['Mr','Miss','Mrs','Master']) == False\n",
    "train['Title'][rare_mask] = 'Rare'\n",
    "train['Title'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x7fd13d110048>"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAELCAYAAAA2mZrgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAFzhJREFUeJzt3X+wX3Wd3/Hni7ApiqBdyZqdBCTVKJt1qYxXdIZWUYHG3VnSiu4GtJXWbsapwY6uZuPoUDeO1Qar4y7YMTtLdZ1qFnWnjUx24hbxRxHdBEEgsHHT+CM36a3Bn0jZhcC7f3xPDl++ubnfb8I995sbno+ZO/f8+HzPeZ+TyX19z+f8SlUhSRLASeMuQJJ0/DAUJEktQ0GS1DIUJEktQ0GS1DIUJEmtTkMhycoku5LsTrJ+mvnPTnJTkjuTfDnJ0i7rkSTNLF3dp5BkAfAd4GJgEtgOXF5V9/S1+SxwY1V9MskrgX9dVf+yk4IkSUN1eaRwPrC7qvZU1UPAZmDVQJsVwE3N8M3TzJckzaEuQ2EJsLdvfLKZ1u/bwGXN8L8ATkvyzA5rkiTN4OQOl51ppg32Vb0DuDbJlcBXgX3AwcMWlKwB1gCceuqpLzrnnHNmt1JJOsHddttt91XVomHtugyFSeDMvvGlwP7+BlW1H3gNQJKnAZdV1c8GF1RVm4BNABMTE7Vjx46uapakE1KS74/Srsvuo+3A8iTLkiwEVgNb+hskOSPJoRreBVzfYT2SpCE6C4WqOgisBbYB9wI3VNXOJBuSXNo0uxDYleQ7wLOA93dVjyRpuM4uSe2K3UeSdPSS3FZVE8PaeUezJKllKEiSWoaCJKllKEiSWoaCJKnV5c1rehJYt24dU1NTLF68mI0bN467HElPkKGgJ2Rqaop9+/aNuwxJs8TuI0lSy1CQJLUMBUlSy1CQJLUMBUlSy1CQJLUMBUlSy1CQJLU6DYUkK5PsSrI7yfpp5p+V5OYktye5M8lvdlmPJGlmnYVCkgXAdcCrgRXA5UlWDDR7D703sp1H73WdH+uqHknScF0eKZwP7K6qPVX1ELAZWDXQpoDTm+GnA/s7rEeSNESXzz5aAuztG58EXjLQ5r3AF5NcBZwKXNRhPZKkIbo8Usg00wZfCH058ImqWgr8JvCpJIfVlGRNkh1Jdhw4cKCDUiVJ0G0oTAJn9o0v5fDuoTcBNwBU1a3AKcAZgwuqqk1VNVFVE4sWLeqoXElSl91H24HlSZYB++idSL5ioM0PgFcBn0jya/RCwUOBWfCDDb8xJ+s5+ONfBk7m4I+/3/k6z7r6rk6XL6nDI4WqOgisBbYB99K7ymhnkg1JLm2a/T7we0m+DXwGuLKqBruYJElzpNOX7FTVVmDrwLSr+4bvAS7osgZJ0ui8o1mS1DIUJEktQ0GS1Or0nIL0ZLFu3TqmpqZYvHgxGzduHHc50jEzFKRZMDU1xb59+8ZdhvSE2X0kSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWp5n4KkWeWNfPOboSBpVnkj3/xm95EkqWUoSJJanYZCkpVJdiXZnWT9NPM/kuSO5uc7SX7aZT2afWec8ijPespBzjjl0XGXImkWdHZOIckC4DrgYmAS2J5kS/O2NQCq6m197a8CzuuqHnXjHeea49KJpMsjhfOB3VW1p6oeAjYDq2Zofzm99zRLksaky1BYAuztG59sph0mybOBZcCXjjB/TZIdSXYcOHBg1guVJPV0eUlqpplWR2i7GvhcVT0y3cyq2gRsApiYmDjSMqRpXfDHF3S+joU/XchJnMTen+6dk/XdctUtna9DT05dHilMAmf2jS8F9h+h7WrsOpKksesyFLYDy5MsS7KQ3h/+LYONkjwf+IfArR3WIkkaQWehUFUHgbXANuBe4Iaq2plkQ5JL+5peDmyuKruFJGnMOn3MRVVtBbYOTLt6YPy9XdYgSRqddzRLklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSp1eljLiTpyWzdunVMTU2xePFiNm7cOO5yRmIoSFJHpqam2Ldv37jLOCqGgjQL6qnFozxKPdWH/Wp+MxSkWfDwBQ+PuwRpVniiWZLU6jQUkqxMsivJ7iTrj9Dmd5Lck2Rnkk93WY8kaWaddR8lWQBcB1xM733N25Nsqap7+tosB94FXFBVP0nyK13VI0karssjhfOB3VW1p6oeAjYDqwba/B5wXVX9BKCqfthhPZKkIboMhSXA3r7xyWZav+cBz0tyS5JvJFnZYT2SpCG6vPoo00wbvF7vZGA5cCGwFPhakhdU1U8ft6BkDbAG4Kyzzpr9SiVJQLdHCpPAmX3jS4H907T5H1X1cFV9F9hFLyQep6o2VdVEVU0sWrSos4Il6cmuy1DYDixPsizJQmA1sGWgzX8HXgGQ5Ax63Ul7OqxJkjSDzkKhqg4Ca4FtwL3ADVW1M8mGJJc2zbYBP0pyD3Az8M6q+lFXNUmSZtbpHc1VtRXYOjDt6r7hAt7e/EiSxsw7miVJLZ99JD2JfOVlL+98HQ+evAASHpycnJP1vfyrX+l8HU8mHilIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpNeNjLpLcz+EvxmlV1emzXpEkaWxmDIWqOg0gyQZgCvgUvTeqvR44rfPqJElzatTuo39WVR+rqvur6udV9V+Ay7osTJI090YNhUeSvD7JgiQnJXk98MiwDyVZmWRXkt1J1k8z/8okB5Lc0fz826PdAEnS7Bn10dlXAB9tfgq4pZl2REkWANcBF9N7F/P2JFuq6p6Bpn9eVWuPqmpJUidGCoWq+h6w6iiXfT6wu6r2ACTZ3CxjMBQkSceJkbqPkjwvyU1J7m7Gz03yniEfWwLs7RufbKYNuizJnUk+l+TMkaqWJHVi1HMKfwK8C3gYoKruBFYP+UymmTZ4eesXgLOr6lzgfwKfnHZByZokO5LsOHDgwIglS5KO1qih8NSq+uuBaQeHfGYS6P/mvxTY39+gqn5UVX/fjP4J8KLpFlRVm6pqoqomFi1aNGLJkqSjNWoo3JfkOTTf9JO8Fvg/Qz6zHVieZFmShfSOLLb0N0jyq32jlwL3jliPJKkDo1599BZgE3BOkn3Ad+ndwHZEVXUwyVpgG7AAuL6qdjY3wu2oqi3AW5NcSu+o48fAlce2GZKk2TBqKHy/qi5KcipwUlXdP8qHqmorsHVg2tV9w++id65CknQcGLX76LtJNgEvBX7RYT2SpDEaNRSeT+/qoLfQC4hrk/yT7sqSJI3DSKFQVQ9W1Q1V9RrgPOB04CudViZJmnMjv08hycuTfAz4FnAK8DudVSVJGouRTjQn+S5wB3AD8M6qeqDTqiRJYzHq1Uf/uKp+3mklkqSxG/bmtXVVtRF4f5LD3sBWVW/trDJJ0pwbdqRw6A7jHV0XIkkav2Gv4/xCM3hnVd0+B/VI0py49ve/MLzRE/TT+x5of8/F+tb+599+wssY9eqjDyf5myTvS/LrT3itkqTj0qj3KbwCuBA4AGxKctcI71OQJM0zI9+nUFVTVfVHwJvpXZ569ZCPSJLmmVHfvPZrSd7bvHntWuDr9N6PIEk6gYx6n8J/BT4DXFJV+4c1liTNT0NDIckC4H9X1UfnoB5J89wzqh73W/PL0O6jqnoEeGbz9rSjkmRlkl1JdidZP0O71yapJBNHuw5Jx5c3PPIoaw8+whseeXTcpegYjPySHeCWJFuA9rlHVfXhI32gOcK4DriY3vuatyfZUlX3DLQ7DXgr8M2jrF2SNMtGvfpoP3Bj0/60vp+ZnA/srqo9VfUQsBlYNU279wEbgb8bsRZJUkdGOlKoqj88hmUvAfb2jU8CL+lvkOQ84MyqujHJO45hHZKkWTTqo7NvBqZ7IN4rZ/rYNNPaZSQ5CfgIcOUI618DrAE466yzhjWXJB2jUc8p9H+LPwW4DDg45DOTwJl940vpdUMdchrwAuDLSQAWA1uSXFpVj3sAX1VtAjYBTExMeEmDJHVk1O6j2wYm3ZJk2Os4twPLkywD9gGrgSv6lvkz4IxD40m+DLxjMBAkSXNn1O6jX+4bPQmYoPfN/oiq6mCStcA2YAFwfVXtTLIB2FFVW46xZklSR0btPrqNx84HHAS+B7xp2IeqaiuwdWDatM9MqqoLR6xFktSRYW9eezGwt6qWNeNvpHc+4XvAPTN8VJI0Dw27T+HjwEMASV4GfAD4JPAzmhO/kqQTx7DuowVV9eNm+HeBTVX1eeDzSe7otjRJ0lwbdqSwIMmh4HgV8KW+eaOej5AkzRPD/rB/BvhKkvuAB4GvASR5Lr0uJEnSCWTGUKiq9ye5CfhV4ItV7bNwTwKu6ro4SdLcGtoFVFXfmGbad7opR5I0TiO/o1mSdOIzFCRJLUNBktQyFCRJLUNBktQyFCRJLUNBktQyFCRJrU5DIcnKJLuS7E6yfpr5b05yV5I7kvyvJCu6rEeSNLPOQiHJAuA64NXACuDyaf7of7qqfqOqXghsBD7cVT2SpOG6PFI4H9hdVXuq6iFgM7Cqv0FV/bxv9FQee7ubJGkMunz89RJgb9/4JPCSwUZJ3gK8HVgIvLLDeiRJQ3R5pJBpph12JFBV11XVc4A/AN4z7YKSNUl2JNlx4MCBWS5TknRIl6EwCZzZN74U2D9D+83AP59uRlVtqqqJqppYtGjRLJYoSerXZShsB5YnWZZkIbAa2NLfIMnyvtHfAv62w3okSUN0dk6hqg4mWQtsAxYA11fVziQbgB1VtQVYm+Qi4GHgJ8Abu6pHkjRcp+9ZrqqtwNaBaVf3Df/7LtcvSTo63tEsSWoZCpKklqEgSWoZCpKklqEgSWoZCpKkVqeXpErSk9mpC09/3O/5wFCQpI5c8JzXjLuEo2b3kSSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqdhkKSlUl2JdmdZP0089+e5J4kdya5Kcmzu6xHkjSzzkIhyQLgOuDVwArg8iQrBprdDkxU1bnA54CNXdUjSRquyyOF84HdVbWnqh4CNgOr+htU1c1V9f+a0W8ASzusR5I0RJehsATY2zc+2Uw7kjcBfzndjCRrkuxIsuPAgQOzWKIkqV+XoZBpptW0DZM3ABPANdPNr6pNVTVRVROLFi2axRIlSf26fErqJHBm3/hSYP9goyQXAe8GXl5Vf99hPbNm3bp1TE1NsXjxYjZu9DSIpBNHl6GwHVieZBmwD1gNXNHfIMl5wMeBlVX1ww5rmVVTU1Ps27dv3GVI0qzrrPuoqg4Ca4FtwL3ADVW1M8mGJJc2za4BngZ8NskdSbZ0VY8kabhOX7JTVVuBrQPTru4bvqjL9UuSjo53NEuSWoaCJKllKEiSWoaCJKllKEiSWp1efTTXXvTOP5uT9Zx23/0sAH5w3/2dr/O2a/5Vp8uXpH4eKUiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWifUfQpz5dGFpz7utySdKAyFY/DA8kvGXYIkdaLT7qMkK5PsSrI7yfpp5r8sybeSHEzy2i5rkSQN11koJFkAXAe8GlgBXJ5kxUCzHwBXAp/uqg5J0ui67D46H9hdVXsAkmwGVgH3HGpQVd9r5j3aYR2SpBF12X20BNjbNz7ZTJMkHae6DIVMM62OaUHJmiQ7kuw4cODAEyxLknQkXYbCJHBm3/hSYP+xLKiqNlXVRFVNLFq0aFaKkyQdrstQ2A4sT7IsyUJgNbClw/VJkp6gzkKhqg4Ca4FtwL3ADVW1M8mGJJcCJHlxkkngdcDHk+zsqh5J0nCd3rxWVVuBrQPTru4b3k6vW0mSdBzw2UeSpJahIElqGQqSpJahIElqGQqSpJahIElqGQqSpJahIElqGQqSpJahIElqGQqSpJahIElqGQqSpJahIElqGQqSpFanoZBkZZJdSXYnWT/N/H+Q5M+b+d9McnaX9UiSZtZZKCRZAFwHvBpYAVyeZMVAszcBP6mq5wIfAf5TV/VIkobr8kjhfGB3Ve2pqoeAzcCqgTargE82w58DXpUkHdYkSZpBl6GwBNjbNz7ZTJu2TfNO558Bz+ywJknSDLp8R/N03/jrGNqQZA2wphn9RZJdT7C22XAGcF/XK8mH3tj1KmbDnOwL/sNxfxA5N/sByFvdF63jv3NhzvbFVR+ecfazR1lGl6EwCZzZN74U2H+ENpNJTgaeDvx4cEFVtQnY1FGdxyTJjqqaGHcdxwP3RY/74THui8fMt33RZffRdmB5kmVJFgKrgS0DbbYAh74Kvxb4UlUddqQgSZobnR0pVNXBJGuBbcAC4Pqq2plkA7CjqrYAfwp8KsluekcIq7uqR5I0XJfdR1TVVmDrwLSr+4b/DnhdlzV06Ljqzhoz90WP++Ex7ovHzKt9EXtrJEmH+JgLSVLLUBgiSSX5VN/4yUkOJLlxnHXNpSf7Phi2/Ukune4xLvPZbP6bJ3lGkn83uxWOX5JHktyR5O4kX0jyjHHXNBsMheEeAF6Q5CnN+MXAvukaNpfVnoie7Ptgxu2vqi1V9cGxVNadkf/NR/AM4KhCIT3H+9+nB6vqhVX1AnoXyrxl1A8ez9t3XBZ1HPpL4Lea4cuBzxyakeS9STYl+SLwZ+Mobo6MvA+S/HqSv26+Rd2ZZPk4Cp5lM23/lUmubYZf13xz/HaSrzbT5uv+mGmbz0/y9SS3N7+f30yfbls/CDynmXZN0+6dSbY3bf6wmXZ2knuTfAz4Fo+/z+l4dyvNExuSPC3JTUm+leSuJKua6YdtX5JLktzatP1skqeNcRt6qsqfGX6AXwDn0ns20ynAHcCFwI3N/PcCtwFPGXetx8s+AP4YeH0zvHC+75sRtv9K4Npm+C5gSTP8jPm6P0bY5tOBk5vhi4DPH2lbgbOBu/uWfQm9K3JC74vpjcDLmnaPAi8d9/aPuo+a3wuAzwIrm/GTgdOb4TOA3c22Pm77mnlfBU5txv8AuHrc23UiHurPuqq6s3ms9+UMXGLb2FJVD85pUXPsKPfBrcC7kywF/qKq/nZuquzOCNt/yC3AJ5LcAPxFM21e7o8h2/x04JPNkUABv9RMP2xbp3nG5SXNz+3N+NOA5cAPgO9X1TdmeVO68pQkd9D7Y38b8FfN9AD/McnL6IXAEuBZzbz+7XspvSdI39Lso4X09t9Y2X00ui3Ah+g7hO7zwBzXMi4j7YOq+jRwKfAgsC3JK+emvM7NtP0AVNWbgffQ6/q4I8kz5/n+ONI2vw+4uXr96b9N72hi1H/7AB+oXn/8C6vquVX1p828+fR/6cGqeiG9Zwot5LFzCq8HFgEvaub/X5r9w+O3L8Bf9e2HFVX1pjmq/YgMhdFdD2yoqrvGXcgYjbQPkvwjYE9V/RG9PyrnzkVxc2Do9id5TlV9s3o3ad5Hr994Pu+PI23z03nsxPOVhyYeYVvvB07r++w24N8c6j9PsiTJr3RTfveq6mfAW4F3JPklevvmh1X1cJJXcOQH0X0DuCDJcwGSPDXJ8+ak6BkYCiOqqsmq+ui46xino9gHvwvc3Rxan8MJcgJ+xO2/pjm5eDe9/uJvM4/3xwzbvBH4QJJb6PWpH3LYtlbVj+h1kdyd5Jqq+iLwaeDWJHfRO29xGvNYVd1O7996NfDfgIkkO+gdNfzNET5zgF6gfibJnfRC4pw5KXgG3tEsSWp5pCBJahkKkqSWoSBJahkKkqSWoSBJahkK0hBJntk8t+eOJFNJ9vWNf71pc3aSK/o+c2GeJE+R1YnFx1xIQzTX2b8Qeg//o/fMmw8NNDsbuILe9ffSvOWRgvQEJPlFM/hB4J82Rw9vG2hzapLrm6eC3n7oqZnS8chQkGbHeuBrzTNsPjIw793Al6rqxcAr6N31fOqcVyiNwFCQuncJsL559MOX6T0c7ayxViQdgecUpO4FuKyqdo27EGkYjxSk2TH4JNB+24Cr0jw0P8l5c1aVdJQMBWl23AkcbF7D+baBee+j9xKaO5unp75vzquTRuRTUiVJLY8UJEktQ0GS1DIUJEktQ0GS1DIUJEktQ0GS1DIUJEktQ0GS1Pr/fkNd2RbdQaUAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.barplot(x='Title', y='Survived', data=train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 预处理数据\n",
    "\n",
    "之前的步骤中，对数据有了初步认识，了解了样本特征和问题的关联程度。接下来就需要对数据进行预处理，便于下一步训练分类模型。这包括一下几个步骤：\n",
    "\n",
    "1. 对缺失值进行填充\n",
    "2. 添加新的特征，比如是否独自乘船\n",
    "3. 对连续值，如年龄、票价，进行分段\n",
    "4. 将枚举值转换为 one-hot 编码的向量"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 填充缺失值\n",
    "\n",
    "前面已经分析过，Age, Cabin, Embarked，Fare 三个属性存在缺失值。Cabin 是船舱号，此处抛弃这个属性。因此需要对 Age 和 Embarked 属性中缺失的值进行填充。\n",
    "\n",
    "**填充 Age**\n",
    "\n",
    "有 177 个乘客的详细年龄缺失，缺失比例还是很大的，需要想办法尽可能合理的对 Age 进行填充。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(177, 17)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train[np.isnan(train['Age'])].shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从乘客 Title 中，大致推断年龄。比如 Miss 应该是 20 岁左右的女士。这里根据乘客的 Title 对缺失年龄进行填充，使用正太分布随机数填充，各个 Title 对应的年龄均值和标准差可以使用 `describe()` 得到："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Master': {'mean': 4.574166666666667, 'std': 3.6198716433439615},\n",
       " 'Miss': {'mean': 21.773972602739725, 'std': 12.99029242215268},\n",
       " 'Mr': {'mean': 32.368090452261306, 'std': 12.70879272257399},\n",
       " 'Mrs': {'mean': 35.898148148148145, 'std': 11.433627902196413},\n",
       " 'Rare': {'mean': 42.38461538461539, 'std': 13.200233098174964}}"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_std_info = train.groupby('Title')['Age'].describe()[['mean','std']].T.to_dict()\n",
    "mean_std_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "def fill_age(title):\n",
    "    mean = mean_std_info[title]['mean']\n",
    "    std = mean_std_info[title]['std']\n",
    "    return np.random.normal(mean, std)\n",
    "\n",
    "train['Age'][np.isnan(train['Age'])] = train['Title'][np.isnan(train['Age'])].apply(fill_age)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**填充 Embarked**\n",
    "\n",
    "Embarked 只有 2 个存在缺失，那就直接使用出现最多的 S 来填充。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "S    644\n",
       "C    168\n",
       "Q     77\n",
       "Name: Embarked, dtype: int64"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(train['Embarked'].isna().sum())\n",
    "train['Embarked'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "train['Embarked'].fillna('S', inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**填充 Fare**\n",
    "\n",
    "测试集中 Fare 存在一个缺失值，使用 Fare 的均值进行填充即可。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['Fare'].fillna(test['Fare'].mean(), inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### one-hot 编码\n",
    "\n",
    "经过上面的处理步骤后，每个乘客的属性都变成了枚举值，还需要在将其向量化。比如 Pclass 有 1,2,3 三种取值，但数字的大小并没有什么意义。不过在使用如 Logistics Regression 这样的线性模型时，因为模型要计算 `wx+b`，这样数字大小就有意义了。为此，在这里需要将所有枚举值转为 one-hot 编码的向量。\n",
    "\n",
    "```\n",
    "1 -> [1 0 0]\n",
    "2 -> [0 1 0]\n",
    "3 -> [0 0 1]\n",
    "```\n",
    "\n",
    "这样 Pclass 属性就使用一个三维的向量表示，其他的枚举属性也这样处理。\n",
    "\n",
    "对枚举值做 one-hot 编码，可以使用 pandas.du"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 整合全部过程\n",
    "\n",
    "经过前面的分析，已经明确了下面几点：\n",
    "\n",
    "- 抛弃与保留那些属性\n",
    "- 对各个属性如何预处理\n",
    "- 新增那些属性\n",
    "- 对缺失值如何处理\n",
    "- 如何对枚举值进行编码\n",
    "\n",
    "我们需要将以上步骤整合起来，写在一个函数里。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "def process(dataset):\n",
    "    # 填充缺失值\n",
    "    dataset['Fare'].fillna(dataset['Fare'].mean(), inplace=True)\n",
    "    dataset['Embarked'].fillna('S', inplace=True)\n",
    "    \n",
    "    # 新增 Title\n",
    "    def get_title(name):\n",
    "        r_title = re.search(' ([A-Za-z]+)\\.', name)\n",
    "        if not r_title:\n",
    "            return \"\"\n",
    "        return r_title.group(1)\n",
    "    \n",
    "    dataset['Title'] = dataset['Name'].apply(get_title)\n",
    "    rare_mask = dataset['Title'].isin(['Mr','Miss','Mrs','Master']) == False\n",
    "    dataset['Title'][rare_mask] = 'Rare'\n",
    "    \n",
    "    # 填充缺失 Age\n",
    "    def fill_age(title):\n",
    "        mean = mean_std_info[title]['mean']\n",
    "        std = mean_std_info[title]['std']\n",
    "        return np.random.normal(mean, std)\n",
    "\n",
    "    dataset['Age'][np.isnan(dataset['Age'])] = dataset['Title'][np.isnan(dataset['Age'])].apply(fill_age)\n",
    "    \n",
    "    # 新增属性\n",
    "    dataset['AgeGroup'] = pd.cut(dataset['Age'], bins=[-np.inf, 15, 30, 45, 60, np.inf], labels=range(5))\n",
    "    dataset['FareGroup'] = pd.cut(dataset['Fare'], bins=[-np.inf, 20, 40, 60, 80, np.inf], labels=range(5))\n",
    "    dataset['FamilySize'] = dataset[['SibSp', 'Parch']].sum(axis=1) + 1\n",
    "    dataset['Alone'] = dataset['FamilySize'] == 1\n",
    "    \n",
    "    # 选择需要的属性\n",
    "    attrs = ['Pclass', 'Sex', 'Embarked', 'AgeGroup', 'FareGroup', 'FamilySize', 'Alone', 'Title']\n",
    "    dataset = dataset[attrs]\n",
    "\n",
    "    # 把数值型属性转为字符型，便于做 one-hot 编码\n",
    "    dataset[['Pclass', 'FamilySize']] = dataset[['Pclass', 'FamilySize']].astype(np.str)\n",
    "\n",
    "    return dataset\n",
    "    \n",
    "train_processed = process(train_clone)\n",
    "test_processed = process(test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 891 entries, 0 to 890\n",
      "Data columns (total 8 columns):\n",
      "Pclass        891 non-null object\n",
      "Sex           891 non-null object\n",
      "Embarked      891 non-null object\n",
      "AgeGroup      891 non-null category\n",
      "FareGroup     891 non-null category\n",
      "FamilySize    891 non-null object\n",
      "Alone         891 non-null bool\n",
      "Title         891 non-null object\n",
      "dtypes: bool(1), category(2), object(5)\n",
      "memory usage: 37.6+ KB\n",
      "None\n",
      "----------------------------------------\n",
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 418 entries, 0 to 417\n",
      "Data columns (total 8 columns):\n",
      "Pclass        418 non-null object\n",
      "Sex           418 non-null object\n",
      "Embarked      418 non-null object\n",
      "AgeGroup      418 non-null category\n",
      "FareGroup     418 non-null category\n",
      "FamilySize    418 non-null object\n",
      "Alone         418 non-null bool\n",
      "Title         418 non-null object\n",
      "dtypes: bool(1), category(2), object(5)\n",
      "memory usage: 17.8+ KB\n",
      "None\n"
     ]
    }
   ],
   "source": [
    "print(train_processed.info())\n",
    "print(\"-\" * 40)\n",
    "print(test_processed.info())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 得到训练数据\n",
    "\n",
    "前面的步骤中对填充了缺失数据，新增了部分属性，将连续值划分成了不同的段。但没有做 one-hot 编码。某个属性的值，可以在训练集中没有出现，但是出现在了测试集中。这个时候 one-hot 编码的时候，就会出现问题。所以在进行 one-hot 编码的时候，一定要知道能够应对这种问题。\n",
    "\n",
    "这里我将训练数据和测试数据合并起来，这样就能得知每个属性的所有可取值，然后利用 sklearn 的 `OneHotEncoder` 来完成 one-hot 编码。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import OneHotEncoder\n",
    "\n",
    "dataset = pd.concat([train_processed, test_processed], axis=0)\n",
    "\n",
    "one_hot = OneHotEncoder()\n",
    "one_hot.fit(dataset)\n",
    "\n",
    "X_train = one_hot.transform(train_processed)\n",
    "X_test = one_hot.transform(test_processed)\n",
    "\n",
    "y_train = train_clone['Survived'].values\n",
    "\n",
    "# 把训练数据打乱\n",
    "index = np.random.permutation(X_train.shape[0])\n",
    "X_train = X_train[index]\n",
    "y_train = y_train[index]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练分类器\n",
    "\n",
    "得到了训练数据之后，就可以开始训练模型了，这里一次性尝试多种常见分类器，使用默认的配置项。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.svm import SVC\n",
    "from sklearn.naive_bayes import BernoulliNB\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.ensemble import AdaBoostClassifier\n",
    "from sklearn.ensemble import BaggingClassifier\n",
    "from sklearn.ensemble import ExtraTreesClassifier\n",
    "\n",
    "from sklearn.metrics import accuracy_score\n",
    "from sklearn.model_selection import cross_val_score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "svc = SVC()\n",
    "bnb = BernoulliNB()\n",
    "lrc = LogisticRegression()\n",
    "rfc = RandomForestClassifier(n_estimators=100, random_state=42)\n",
    "abc = AdaBoostClassifier(n_estimators=100, random_state=42)\n",
    "bc = BaggingClassifier(n_estimators=100, random_state=42)\n",
    "etc = ExtraTreesClassifier(n_estimators=100, random_state=42)\n",
    "\n",
    "clfs = [svc, bnb, lrc, rfc, abc, bc, etc]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了评估分类器的效果，一种做法是从训练数据中划出一部分数据作为验证集。不过为了得到较准确评估结果，往往需要多次划分训练集和验证集，取均值。sklearn 中的 `cross_val_score` 可以帮助我们完成此工作。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SVC - 82.15% - 0.27s\n",
      "BernoulliNB - 79.12% - 0.03s\n",
      "LogisticRegression - 81.70% - 0.03s\n",
      "RandomForestClassifier - 80.13% - 1.22s\n",
      "AdaBoostClassifier - 81.36% - 0.98s\n",
      "BaggingClassifier - 80.02% - 2.57s\n",
      "ExtraTreesClassifier - 80.02% - 1.13s\n"
     ]
    }
   ],
   "source": [
    "from time import time\n",
    "\n",
    "for clf in clfs:\n",
    "    start = time()\n",
    "    score = cross_val_score(clf, X_train, y_train, cv=5, scoring=\"accuracy\").mean()\n",
    "    duration = time() - start\n",
    "    clf_name = clf.__class__.__name__\n",
    "    \n",
    "    print(\"{} - {:.2%} - {:.2f}s\".format(clf_name, score, duration))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从以上结果可以看出，除了朴素贝叶斯模型外，其他模型都能得到 81% 以上的准确度。下一步可以尝试对各个模型的参数进行调整，找出各个模型的最佳参数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 优化模型参数\n",
    "\n",
    "这里以随机森林分类器为例，使用 GridSearch 来搜索最佳参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import GridSearchCV\n",
    "\n",
    "random_forest = RandomForestClassifier()\n",
    "\n",
    "param_grid = [{\n",
    "    'n_estimators': [40, 50, 70, 100],\n",
    "    'max_depth': [3, 4, 5, 6, 8],\n",
    "    'min_samples_leaf': [1, 2, 3]\n",
    "}]\n",
    "\n",
    "grid_search = GridSearchCV(random_forest, param_grid, cv=5, scoring='accuracy', n_jobs=-1)\n",
    "\n",
    "_ = grid_search.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.8282828282828283,\n",
       " {'max_depth': 5, 'min_samples_leaf': 3, 'n_estimators': 40})"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grid_search.best_score_, grid_search.best_params_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到调优后的随机森林分类器得到了 82.9% 的最准确率。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 多分类器集成\n",
    "\n",
    "中国有谚语：“三个臭皮匠赛过诸葛亮”，对同一个样本多个分类器可能给出不同的预测结果，多个分类器投票觉得结果，往往更加可靠。因为每个分类器在分类的时候的依据往往是不同的，有的从空间上来划分，有的从概率分布上来决定。综合多种分类器的分类结果，常常能够取得更佳的效果。\n",
    "\n",
    "但是如果分类器都是同种类型的，这就相当于在投票的群体中有一部分是一伙的，他们常常倾向于给出相同的投票结果，因此在投票的时候应该避免使用同类的分类器。\n",
    "\n",
    "使用 sklearn 中 VotingClassifier 分类器，可以轻松完成这个投票过程："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8282130175080601"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.ensemble import VotingClassifier\n",
    "\n",
    "svc = SVC(C=0.3, degree=2, kernel='linear', probability=True)\n",
    "lrc = LogisticRegression(C=0.8, penalty='l1')\n",
    "# rfc = RandomForestClassifier(n_estimators=40, max_depth=5)\n",
    "abc = AdaBoostClassifier(n_estimators=60)\n",
    "bc = BaggingClassifier(n_estimators=50, bootstrap_features=True, max_features=0.9, max_samples=0.9)\n",
    "# etc = ExtraTreesClassifier(n_estimators=100)\n",
    "\n",
    "clfs = [svc, lrc, abc, bc]\n",
    "\n",
    "eclf = VotingClassifier(estimators=[\n",
    "    (clf.__class__.__name__, clf) for clf in clfs\n",
    "], voting='hard')\n",
    "\n",
    "cross_val_score(eclf,  X_train, y_train, cv=5, n_jobs=-1).mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "分类器能够输出分类的概率，如果分类为正的概率为 0.9 就说明比较确信。如果分类为 0.5 则说明模棱两可。如果在某次决策中，其他分类器都给出了 0.49 的概率，只有一个分类器给出了 0.99 的概率。这说明在大家都不太确定的时候，其中有一个分类器非常确信，此次投票就可以听他的。\n",
    "\n",
    "这种策略就是软投票，将各个分类器输出的概率求均值，然后判断概率是否大于 0.5，大于就是正，否则为负。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8315901520221207"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "eclf = VotingClassifier(estimators=[\n",
    "    (clf.__class__.__name__, clf) for clf in clfs\n",
    "], voting='soft')\n",
    "\n",
    "cross_val_score(eclf,  X_train, y_train, cv=5, n_jobs=-1).mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用 `cross_val_score` 只是评估了模型的效果，它 clone 了原模型，并多次交叉验证。要得到可以进行预测的模型，需要使用全部的数据进行训练。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "VotingClassifier(estimators=[('SVC', SVC(C=0.3, cache_size=200, class_weight=None, coef0=0.0,\n",
       "  decision_function_shape='ovr', degree=2, gamma='auto_deprecated',\n",
       "  kernel='linear', max_iter=-1, probability=True, random_state=None,\n",
       "  shrinking=True, tol=0.001, verbose=False)), ('LogisticRegression', LogisticRegressio...imators=50, n_jobs=None, oob_score=False, random_state=None,\n",
       "         verbose=0, warm_start=False))],\n",
       "         flatten_transform=None, n_jobs=None, voting='soft', weights=None)"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "eclf.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 预测\n",
    "\n",
    "预测并保存结果，上传结果至 Kaggle。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred = eclf.predict(X_test)\n",
    "\n",
    "submission = pd.DataFrame({\n",
    "    \"PassengerId\": test['PassengerId'],\n",
    "    \"Survived\": y_pred\n",
    "})\n",
    "\n",
    "submission.to_csv(\"../data/titanic/submission.csv\", index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "提交到 Kaggle 上之后得分 0.78947。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
